• DocumentCode
    3023130
  • Title

    A learning approach to discovering Web page semantic structures

  • Author

    Feng, Junlan ; Haffner, Patrick ; Gilbert, Mazin

  • fYear
    2005
  • fDate
    29 Aug.-1 Sept. 2005
  • Firstpage
    1055
  • Abstract
    This paper proposes a learning approach for discovering the semantic structure of Web pages. The task includes partitioning the text on a Web page into information blocks and identifying their semantic categories. We employed two machine learning techniques, Adaboost and SVMs, to learn from a labeled Web page corpus. We evaluated our approach on general Web pages from the World Wide Web and obtained encouraging results. This work can be beneficial to a number of Web-driven applications such as search engines, Web-based question answering, Web-based data mining as well as voice enabled Web navigation.
  • Keywords
    Web sites; learning (artificial intelligence); support vector machines; text analysis; Adaboost; Web navigation; Web page semantic structure; Web-based data mining; Web-based question answering; World Wide Web; machine learning; search engine; semantic category; support vector machine; text partitioning; Data mining; HTML; Humans; Image segmentation; Machine learning; Navigation; Partitioning algorithms; Search engines; Web pages; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on
  • ISSN
    1520-5263
  • Print_ISBN
    0-7695-2420-6
  • Type

    conf

  • DOI
    10.1109/ICDAR.2005.19
  • Filename
    1575705