Title :
Automatic Annotation for the Generation of Extraction Rules
Author :
Shi, Yufei ; Chen, Rong
Author_Institution :
Coll. of Informational Sci. & Technol., Dalian Maritime Univ., Dalian, China
Abstract :
Current Web information extraction systems are supervised systems which require manual annotation of training instances in order to learn extraction rules. The annotation is tedious and subject to changes when Web sites upgrade. In this paper, we present a finite-state-transducer-based method of automatic annotation, which can deal with pages with missing attributes, multiple-valued attributes, multi-ordering attributes. Moreover, we also argument it with probability theory to reduce the uncertainty of the state machine. The experimental results show that our algorithm can annotate Web pages efficiently and accurately and thus speed-up extraction rules learning in Web information extraction systems.
Keywords :
Web sites; data mining; finite state machines; information retrieval systems; learning (artificial intelligence); probability; uncertainty handling; Web information extraction system; Website; automatic annotation; extraction rules generation; extraction rules learning; finite state transducer based method; probability theory; state machine; supervised system; Books; Data mining; Logic gates; Particle separators; Training; Transducers; Web pages;
Conference_Titel :
Management and Service Science (MASS), 2010 International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-5325-2
Electronic_ISBN :
978-1-4244-5326-9
DOI :
10.1109/ICMSS.2010.5575684