Title :
Adapting information extraction knowledge for unseen Web sites
Author :
Wong, Tak-Lam ; Lam, Wai
Author_Institution :
Dept. of Syst. Eng. & Eng. Manage., Chinese Univ. of Hong Kong, China
Abstract :
We propose a wrapper adaptation framework which aims at adapting a learned wrapper to an unseen Web site. It significantly reduces human effort in constructing wrappers. Our framework makes use of extraction rules previously discovered from a particular site to seek potential training example candidates for an unseen site. Rule generalization and text categorization are employed for finding suitable example candidates. Another feature of our approach is that it makes use of the previously discovered lexicon to classify good training examples automatically for the new site. We conducted extensive experiments to evaluate the quality of the extraction performance and the adaptability of our approach.
Keywords :
Web sites; data mining; learning (artificial intelligence); extraction rules; information extraction knowledge; lexicon; rule generalization; text categorization; unseen Web site; wrapper adaptation framework; Automation; Data mining; Humans; Information retrieval; Keyword search; Natural languages; Research and development management; Systems engineering and theory; Text categorization; Web pages;
Conference_Titel :
Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
Print_ISBN :
0-7695-1754-4
DOI :
10.1109/ICDM.2002.1183995