DocumentCode
3250532
Title
Adapting information extraction knowledge for unseen Web sites
Author
Wong, Tak-Lam ; Lam, Wai
Author_Institution
Dept. of Syst. Eng. & Eng. Manage., Chinese Univ. of Hong Kong, China
fYear
2002
fDate
2002
Firstpage
506
Lastpage
513
Abstract
We propose a wrapper adaptation framework which aims at adapting a learned wrapper to an unseen Web site. It significantly reduces human effort in constructing wrappers. Our framework makes use of extraction rules previously discovered from a particular site to seek potential training example candidates for an unseen site. Rule generalization and text categorization are employed for finding suitable example candidates. Another feature of our approach is that it makes use of the previously discovered lexicon to classify good training examples automatically for the new site. We conducted extensive experiments to evaluate the quality of the extraction performance and the adaptability of our approach.
Keywords
Web sites; data mining; learning (artificial intelligence); extraction rules; information extraction knowledge; lexicon; rule generalization; text categorization; unseen Web site; wrapper adaptation framework; Automation; Data mining; Humans; Information retrieval; Keyword search; Natural languages; Research and development management; Systems engineering and theory; Text categorization; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
Print_ISBN
0-7695-1754-4
Type
conf
DOI
10.1109/ICDM.2002.1183995
Filename
1183995
Link To Document