• DocumentCode
    3250532
  • Title

    Adapting information extraction knowledge for unseen Web sites

  • Author

    Wong, Tak-Lam ; Lam, Wai

  • Author_Institution
    Dept. of Syst. Eng. & Eng. Manage., Chinese Univ. of Hong Kong, China
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    506
  • Lastpage
    513
  • Abstract
    We propose a wrapper adaptation framework which aims at adapting a learned wrapper to an unseen Web site. It significantly reduces human effort in constructing wrappers. Our framework makes use of extraction rules previously discovered from a particular site to seek potential training example candidates for an unseen site. Rule generalization and text categorization are employed for finding suitable example candidates. Another feature of our approach is that it makes use of the previously discovered lexicon to classify good training examples automatically for the new site. We conducted extensive experiments to evaluate the quality of the extraction performance and the adaptability of our approach.
  • Keywords
    Web sites; data mining; learning (artificial intelligence); extraction rules; information extraction knowledge; lexicon; rule generalization; text categorization; unseen Web site; wrapper adaptation framework; Automation; Data mining; Humans; Information retrieval; Keyword search; Natural languages; Research and development management; Systems engineering and theory; Text categorization; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
  • Print_ISBN
    0-7695-1754-4
  • Type

    conf

  • DOI
    10.1109/ICDM.2002.1183995
  • Filename
    1183995