DocumentCode
2136532
Title
Automatic Annotation for the Generation of Extraction Rules
Author
Shi, Yufei ; Chen, Rong
Author_Institution
Coll. of Informational Sci. & Technol., Dalian Maritime Univ., Dalian, China
fYear
2010
fDate
24-26 Aug. 2010
Firstpage
1
Lastpage
5
Abstract
Current Web information extraction systems are supervised systems which require manual annotation of training instances in order to learn extraction rules. The annotation is tedious and subject to changes when Web sites upgrade. In this paper, we present a finite-state-transducer-based method of automatic annotation, which can deal with pages with missing attributes, multiple-valued attributes, multi-ordering attributes. Moreover, we also argument it with probability theory to reduce the uncertainty of the state machine. The experimental results show that our algorithm can annotate Web pages efficiently and accurately and thus speed-up extraction rules learning in Web information extraction systems.
Keywords
Web sites; data mining; finite state machines; information retrieval systems; learning (artificial intelligence); probability; uncertainty handling; Web information extraction system; Website; automatic annotation; extraction rules generation; extraction rules learning; finite state transducer based method; probability theory; state machine; supervised system; Books; Data mining; Logic gates; Particle separators; Training; Transducers; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Management and Service Science (MASS), 2010 International Conference on
Conference_Location
Wuhan
Print_ISBN
978-1-4244-5325-2
Electronic_ISBN
978-1-4244-5326-9
Type
conf
DOI
10.1109/ICMSS.2010.5575684
Filename
5575684
Link To Document