DocumentCode
3621758
Title
Learning Logic Wrappers for Information Extraction from the Web
Author
C. Badica;E. Popescu;A. Badica
Author_Institution
University of Craiova
fYear
2005
fDate
6/27/1905 12:00:00 AM
Firstpage
336
Lastpage
339
Abstract
This paper discusses a methodology for applying general-purpose first-order inductive learning to extract information from Web documents structured as unranked ordered trees. The methodology is applied to information extraction from real-world HTML page sets that represent product information sheets, an important task in product data integration. The methodology addresses the problems of defining information extraction rules in the form of logic wrappers and mapping the task of learning these rules to general purpose first-order inductive learning.
Keywords
"Data mining","HTML","Logic programming","Software engineering","Information systems","Humans","Information filtering","Information filters","Natural languages","XML"
Publisher
ieee
Conference_Titel
Applications and the Internet Workshops, 2005. Saint Workshops 2005. The 2005 Symposium on
Print_ISBN
0-7695-2263-7
Type
conf
DOI
10.1109/SAINTW.2005.1620043
Filename
1620043
Link To Document