DocumentCode :
3250532
Title :
Adapting information extraction knowledge for unseen Web sites
Author :
Wong, Tak-Lam ; Lam, Wai
Author_Institution :
Dept. of Syst. Eng. & Eng. Manage., Chinese Univ. of Hong Kong, China
fYear :
2002
fDate :
2002
Firstpage :
506
Lastpage :
513
Abstract :
We propose a wrapper adaptation framework which aims at adapting a learned wrapper to an unseen Web site. It significantly reduces human effort in constructing wrappers. Our framework makes use of extraction rules previously discovered from a particular site to seek potential training example candidates for an unseen site. Rule generalization and text categorization are employed for finding suitable example candidates. Another feature of our approach is that it makes use of the previously discovered lexicon to classify good training examples automatically for the new site. We conducted extensive experiments to evaluate the quality of the extraction performance and the adaptability of our approach.
Keywords :
Web sites; data mining; learning (artificial intelligence); extraction rules; information extraction knowledge; lexicon; rule generalization; text categorization; unseen Web site; wrapper adaptation framework; Automation; Data mining; Humans; Information retrieval; Keyword search; Natural languages; Research and development management; Systems engineering and theory; Text categorization; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
Print_ISBN :
0-7695-1754-4
Type :
conf
DOI :
10.1109/ICDM.2002.1183995
Filename :
1183995
Link To Document :
بازگشت