DocumentCode :
430254
Title :
Context Generalization for Information Extraction from the Web
Author :
Habegger, Benjamin ; Quafafou, Mohamed
Author_Institution :
LINA, France
fYear :
2004
fDate :
20-24 Sept. 2004
Firstpage :
720
Lastpage :
723
Abstract :
Many online data sources, such as product catalogs, on-line directories, etc. are available on the web. Extracting information from such sources is a hard task since these sources are designed to be presented to human users. Many researchers have tackled the problem of building wrappers for such sources. The state of the art approach is to use machine learning techniques based on fully labeled example pages. In this paper we propose and study an approach based on example instances. This allows the user to build a wrapper using only a handful of examples of the whole source allowing to take into account structural differences. The patterns obtained allow to extract the instances of the relation described by the examples and contained in the same data source.
Keywords :
Application software; Buildings; Catalogs; Data mining; Humans; Induction generators; Labeling; Machine learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
Print_ISBN :
0-7695-2100-2
Type :
conf
DOI :
10.1109/WI.2004.10076
Filename :
1410905
Link To Document :
بازگشت