• DocumentCode
    2183747
  • Title

    Automatically generating labeled examples for Web wrapper maintenance

  • Author

    Raposo, Juan ; Pan, Alberto ; Alvarez, Manuel ; Hidalgo, Justo

  • Author_Institution
    A Coruna Univ., Spain
  • fYear
    2005
  • fDate
    19-22 Sept. 2005
  • Firstpage
    250
  • Lastpage
    256
  • Abstract
    In order to let software programs gain full benefit from semi-structured Web sources, wrapper programs must be built to provide a "machine readable" view over them. A significant problem of this approach is that, since Web sources are autonomous, they may experience changes that invalidate the current wrapper. In this paper, we address this problem by introducing novel heuristics and algorithms for automatically maintaining wrappers. In our approach, the system collects some query results during normal wrapper operation and, when the source changes, it uses them as input to generate a set of labeled examples for the source which can then be used to induce a new wrapper. Our experiments show that the proposed techniques show high accuracy for a wide range of real world Web data extraction problems.
  • Keywords
    Internet; software maintenance; Web data extraction; Web source; Web wrapper maintenance; machine readable; Application software; Data mining; Databases; Heuristic algorithms; Induction generators;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2005. Proceedings. The 2005 IEEE/WIC/ACM International Conference on
  • Print_ISBN
    0-7695-2415-X
  • Type

    conf

  • DOI
    10.1109/WI.2005.40
  • Filename
    1517850