DocumentCode :
2315811
Title :
Facilitating wrapper generation with page analysis
Author :
Wu, Bo ; Cheng, Xueqi ; Wang, Yu ; Zhang, Gang ; Ding, Guodong
Author_Institution :
Inst. of Comput. Technol., Chinese Acad. of Sci., Beijing
fYear :
2009
fDate :
8-11 June 2009
Firstpage :
191
Lastpage :
193
Abstract :
Current approaches for generating wrappers for web page extraction suffer from the requirement of huge amount of labeled training pages to obtain satisfying results. On the other hand, the quality of data extracted by fully automatic methods is not reliable. In this paper, we propose a novel method to facilitate wrapper generation by combining wrapper induction and page analysis approaches. In addition to manually labeled data, we also take advantage of a set of unlabeled pages to improve the quality of induced wrappers. Our experiments demonstrate that our system achieves a satisfying result with fewer manually labeled training pages.
Keywords :
Internet; information retrieval; text analysis; labeled training pages; page analysis; web page extraction; wrapper generation; Classification tree analysis; Computers; Data mining; Humans; Induction generators; Intersymbol interference; Labeling; Skeleton; USA Councils; Web pages; infromation extraction; web mining; wrapper;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligence and Security Informatics, 2009. ISI '09. IEEE International Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4244-4171-6
Electronic_ISBN :
978-1-4244-4173-0
Type :
conf
DOI :
10.1109/ISI.2009.5137299
Filename :
5137299
Link To Document :
بازگشت