• DocumentCode
    2136532
  • Title

    Automatic Annotation for the Generation of Extraction Rules

  • Author

    Shi, Yufei ; Chen, Rong

  • Author_Institution
    Coll. of Informational Sci. & Technol., Dalian Maritime Univ., Dalian, China
  • fYear
    2010
  • fDate
    24-26 Aug. 2010
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    Current Web information extraction systems are supervised systems which require manual annotation of training instances in order to learn extraction rules. The annotation is tedious and subject to changes when Web sites upgrade. In this paper, we present a finite-state-transducer-based method of automatic annotation, which can deal with pages with missing attributes, multiple-valued attributes, multi-ordering attributes. Moreover, we also argument it with probability theory to reduce the uncertainty of the state machine. The experimental results show that our algorithm can annotate Web pages efficiently and accurately and thus speed-up extraction rules learning in Web information extraction systems.
  • Keywords
    Web sites; data mining; finite state machines; information retrieval systems; learning (artificial intelligence); probability; uncertainty handling; Web information extraction system; Website; automatic annotation; extraction rules generation; extraction rules learning; finite state transducer based method; probability theory; state machine; supervised system; Books; Data mining; Logic gates; Particle separators; Training; Transducers; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Management and Service Science (MASS), 2010 International Conference on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-1-4244-5325-2
  • Electronic_ISBN
    978-1-4244-5326-9
  • Type

    conf

  • DOI
    10.1109/ICMSS.2010.5575684
  • Filename
    5575684