DocumentCode
456352
Title
A Method for Information Extraction from the Web
Author
Nachouki, Gilles
Author_Institution
Faculte des Sci. et des Techniques, LINA, Nantes
Volume
1
fYear
0
fDate
0-0 0
Firstpage
517
Lastpage
521
Abstract
Many data sources are available today on the Web like product catalogs, conference, and multiple directories. The extraction of information from the content is a hard task since they are heterogeneous and dynamic. This paper presents a new method for extracting wrappers and relations from the Web that combines the discovery of similarities in the structures of data that a user wishes to extract from a given Web page, and the generalization of contexts of the extracted data. This method is now implemented in MDSManager our system for data sources Fusion
Keywords
Internet; information retrieval; MDSManager; Web information extraction; Web pages; data extraction; Books; Catalogs; Data mining; HTML; Information retrieval; Search methods; Tail; Web pages; World Wide Web; XML; Information extraction; Web; XML; data sources fusion;
fLanguage
English
Publisher
ieee
Conference_Titel
Information and Communication Technologies, 2006. ICTTA '06. 2nd
Conference_Location
Damascus
Print_ISBN
0-7803-9521-2
Type
conf
DOI
10.1109/ICTTA.2006.1684424
Filename
1684424
Link To Document