DocumentCode
3260306
Title
Understanding theWeb Page Layout
Author
Zhou, Minghong ; Li, Rubao ; Li, Wei
Author_Institution
Inst. of Comput. Technol., Chinese Acad. of Sci.
fYear
2006
fDate
Dec. 2006
Firstpage
438
Lastpage
442
Abstract
Web pages express their semantics not only by free texts, but also by their layouts. While information is explicitly encoded in free texts, the layout implicitly uncovers the semantical relationships of the free texts. In this paper, we proposed a framework for mining the semantics implied by the layout. The core of our work is a new HTML document model, called nested table model, which synthesize the DOM model and the syntax of HTML language. By the nested table model, we could formally define the relevancy of free texts. And hence, free texts could be grouped by their relevancy. Our experiment results indicate that the relevancy correctly reflects the semantics of Web page layout
Keywords
Internet; data mining; hypermedia markup languages; DOM model; HTML document model; Web page layout; free text relevancy; nested table model; semantic mining; semantical relationships; Computers; Conferences; Context modeling; Data mining; HTML; Relational databases; Search engines; Temperature; Weather forecasting; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining Workshops, 2006. ICDM Workshops 2006. Sixth IEEE International Conference on
Conference_Location
Hong Kong
Print_ISBN
0-7695-2702-7
Type
conf
DOI
10.1109/ICDMW.2006.163
Filename
4063667
Link To Document