DocumentCode :
2014759
Title :
A General Approach for Partitioning Web Page Content Based on Geometric and Style Information
Author :
Guo, Hui ; Mahmud, Jalal ; Borodin, Yevgen ; Stent, Amanda ; Ramakrishnan, I.V.
Author_Institution :
Stony Brook Univ., Stony Brook
Volume :
2
fYear :
2007
fDate :
23-26 Sept. 2007
Firstpage :
929
Lastpage :
933
Abstract :
In this paper, we describe a general-purpose approach for partitioning Web page content. The novelty of our approach lies in the use of detailed layout information from a Web page renderer to determine spatial locality and identify visual separators, and the use of relaxed matching over presentation style information to determine presentation style similarity. We present several examples to illustrate the generality of our approach.
Keywords :
Internet; general-purpose approach; geometric-style information; partitioning Web page content; visual separators; Clustering algorithms; Computer science; HTML; Humans; Marketing and sales; Ontologies; Particle separators; Partitioning algorithms; Rendering (computer graphics); Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
Conference_Location :
Parana
ISSN :
1520-5363
Print_ISBN :
978-0-7695-2822-9
Type :
conf
DOI :
10.1109/ICDAR.2007.4377051
Filename :
4377051
Link To Document :
بازگشت