Title :
Semantic HTML Page Segmentation using Type Analysis
Author :
Yang, Xin ; Xiang, Peifeng ; Shi, Yuanchun
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing
Abstract :
Semantic information is necessary for semantic Web processing and is useful to Web adaptation services such as personalization of users´ browsing activities on small screen devices. However, semantic information is always implicitly encoded in most existing HTML documents. This paper describes a page segmentation method to parse Web pages into rectangular segments containing some semantic information, namely blocks. Existing page segmentation techniques are mainly built on HTML DOM structure or purely vision based, not accurate enough either in visual presentation or in semantic sense. Our approach is automatic, and based on a refined typing system which tightly couples type analysis with indispensable visual cues to generate blocks into the tree structure, aiming to achieve high degree of coherence in both semantic and visual views. Experimental results show better accuracy and completeness of our method over existing ones
Keywords :
hypermedia markup languages; semantic Web; tree data structures; Web adaptation service; pattern discovery; semantic HTML page segmentation; semantic Web; semantic structural tree; type analysis; Application software; Computer science; Filtering; HTML; Particle separators; Pervasive computing; Semantic Web; Stress; Tree data structures; Web pages; Block; Page Segmentation; Pattern Discovery; Semantic Structural Tree; Type Recognition; Visual Cues;
Conference_Titel :
Pervasive Computing and Applications, 2006 1st International Symposium on
Conference_Location :
Urumqi
Print_ISBN :
1-4244-0326-x
Electronic_ISBN :
1-4244-0326-x
DOI :
10.1109/SPCA.2006.297506