• DocumentCode
    2292804
  • Title

    A supervised visual wrapper generator for Web-data extraction

  • Author

    Meng, Xiaofeng ; Wang, Haiyan ; Hu, Dongdong ; Li, Chen

  • Author_Institution
    Sch. of Inf., Renmin Univ. of China, Beijing, China
  • fYear
    2003
  • fDate
    3-6 Nov. 2003
  • Firstpage
    657
  • Lastpage
    662
  • Abstract
    Extracting data from Web pages using wrappers is a fundamental problem arising in a large variety of applications of vast practical interest. In this paper, we propose a novel schema-guided approach to wrapper generation. We provide a user-friendly interface that allows users to define the schema of the data to be extracted, and specifies mappings from a HTML page to the target schema. Based on the mappings, the system can automatically generate an extraction rule to extract data from the page. Our approach to wrapper generation can significantly reduce the work of human beings in this process. And the user never has to deal with the internal extraction rule, or even familiarity with the details of HTML.
  • Keywords
    Web sites; hypermedia markup languages; information retrieval; online front-ends; user interfaces; HTML page; Web pages; Web-data extraction; schema-guided approach; supervised visual wrapper generator; user-friendly interface; wrapper generation; Computer languages; Data mining; HTML; Humans; Information management; Machine learning; Web pages; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Software and Applications Conference, 2003. COMPSAC 2003. Proceedings. 27th Annual International
  • ISSN
    0730-3157
  • Print_ISBN
    0-7695-2020-0
  • Type

    conf

  • DOI
    10.1109/CMPSAC.2003.1245412
  • Filename
    1245412