• DocumentCode
    2860336
  • Title

    Tree-Structured Template Generation for Web Pages

  • Author

    Chuang, Shui-Lung ; Hsu, Jane Yung-jen

  • Author_Institution
    Academia Sinica, Taiwan
  • fYear
    2004
  • fDate
    20-24 Sept. 2004
  • Firstpage
    327
  • Lastpage
    333
  • Abstract
    As the web becomes an increasingly important source of information, tools for modeling, searching, and extracting information from Web pages are indispensable. By modeling the structure of a Web page defined by its markup tags, one can easily extract target information using structural templates. This paper introduces the Tree Template Automatic Generator (TTAG) that learns tree-structured templates from training Web pages. TTAG was applied to both query-based and frequently updated Web sites, and produced effective templates from a small number of examples. The experiments show that TTAG is a powerful extraction tool for semi-structured information sources.
  • Keywords
    Automata; Data mining; Databases; HTML; Information resources; Information science; Internet; Power generation; Seminars; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
  • Print_ISBN
    0-7695-2100-2
  • Type

    conf

  • DOI
    10.1109/WI.2004.10101
  • Filename
    1410822