DocumentCode :
456352
Title :
A Method for Information Extraction from the Web
Author :
Nachouki, Gilles
Author_Institution :
Faculte des Sci. et des Techniques, LINA, Nantes
Volume :
1
fYear :
0
fDate :
0-0 0
Firstpage :
517
Lastpage :
521
Abstract :
Many data sources are available today on the Web like product catalogs, conference, and multiple directories. The extraction of information from the content is a hard task since they are heterogeneous and dynamic. This paper presents a new method for extracting wrappers and relations from the Web that combines the discovery of similarities in the structures of data that a user wishes to extract from a given Web page, and the generalization of contexts of the extracted data. This method is now implemented in MDSManager our system for data sources Fusion
Keywords :
Internet; information retrieval; MDSManager; Web information extraction; Web pages; data extraction; Books; Catalogs; Data mining; HTML; Information retrieval; Search methods; Tail; Web pages; World Wide Web; XML; Information extraction; Web; XML; data sources fusion;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information and Communication Technologies, 2006. ICTTA '06. 2nd
Conference_Location :
Damascus
Print_ISBN :
0-7803-9521-2
Type :
conf
DOI :
10.1109/ICTTA.2006.1684424
Filename :
1684424
Link To Document :
بازگشت