• DocumentCode
    2382883
  • Title

    An approach based on extracted data for wrapper maintenance

  • Author

    Luo, Wei ; Li, Qingzhong ; Ding, Yanhui

  • Author_Institution
    Sch. of Comput. Sci. & Technol., ShanDong Univ., Jinan, China
  • fYear
    2010
  • fDate
    1-3 Dec. 2010
  • Firstpage
    88
  • Lastpage
    92
  • Abstract
    Extracting data from Web pages using wrappers is a fundamental problem arising in a large variety of applications of vast practical interests. There are two main issues relevant to Web data extraction, namely wrapper generation and wrapper maintenance. In this paper, we propose a novel approach to the problem of automatic wrapper maintenance. It is based on the truth that despite various page changes, many important features of the pages are preserved, such as syntactic patterns, annotations, and content of the extracted data items. The approach uses these preserved features to identify the locations of the desired values in the changed pages, then the wrappers can be repaired. The experiments on real Web sites show that the proposed approach can effectively maintain wrappers to extract desired data with accuracies.
  • Keywords
    Web sites; Web data extraction; Web page; Web site; automatic wrapper maintenance; wrapper generation; wrapper maintenance; Web data extraction; Web data integration; Wrapper Maintenance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pervasive Computing and Applications (ICPCA), 2010 5th International Conference on
  • Conference_Location
    Maribor
  • Print_ISBN
    978-1-4244-9144-5
  • Type

    conf

  • DOI
    10.1109/ICPCA.2010.5704080
  • Filename
    5704080