• DocumentCode
    3260306
  • Title

    Understanding theWeb Page Layout

  • Author

    Zhou, Minghong ; Li, Rubao ; Li, Wei

  • Author_Institution
    Inst. of Comput. Technol., Chinese Acad. of Sci.
  • fYear
    2006
  • fDate
    Dec. 2006
  • Firstpage
    438
  • Lastpage
    442
  • Abstract
    Web pages express their semantics not only by free texts, but also by their layouts. While information is explicitly encoded in free texts, the layout implicitly uncovers the semantical relationships of the free texts. In this paper, we proposed a framework for mining the semantics implied by the layout. The core of our work is a new HTML document model, called nested table model, which synthesize the DOM model and the syntax of HTML language. By the nested table model, we could formally define the relevancy of free texts. And hence, free texts could be grouped by their relevancy. Our experiment results indicate that the relevancy correctly reflects the semantics of Web page layout
  • Keywords
    Internet; data mining; hypermedia markup languages; DOM model; HTML document model; Web page layout; free text relevancy; nested table model; semantic mining; semantical relationships; Computers; Conferences; Context modeling; Data mining; HTML; Relational databases; Search engines; Temperature; Weather forecasting; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops, 2006. ICDM Workshops 2006. Sixth IEEE International Conference on
  • Conference_Location
    Hong Kong
  • Print_ISBN
    0-7695-2702-7
  • Type

    conf

  • DOI
    10.1109/ICDMW.2006.163
  • Filename
    4063667