A Method for Information Extraction from the Web

Author

Nachouki, Gilles

Author_Institution

Faculte des Sci. et des Techniques, LINA, Nantes

Volume

fYear

fDate

0-0 0

Firstpage

517

Lastpage

521

Abstract

Many data sources are available today on the Web like product catalogs, conference, and multiple directories. The extraction of information from the content is a hard task since they are heterogeneous and dynamic. This paper presents a new method for extracting wrappers and relations from the Web that combines the discovery of similarities in the structures of data that a user wishes to extract from a given Web page, and the generalization of contexts of the extracted data. This method is now implemented in MDSManager our system for data sources Fusion

Keywords

Internet; information retrieval; MDSManager; Web information extraction; Web pages; data extraction; Books; Catalogs; Data mining; HTML; Information retrieval; Search methods; Tail; Web pages; World Wide Web; XML; Information extraction; Web; XML; data sources fusion;

fLanguage

English

Publisher

ieee

Conference_Titel

Information and Communication Technologies, 2006. ICTTA '06. 2nd

Conference_Location

Damascus

Print_ISBN

0-7803-9521-2

Type

conf

DOI

10.1109/ICTTA.2006.1684424

Filename

1684424

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=456352