DocumentCode
2520299
Title
A Tool for Computing the Visual Similarity of Web Pages
Author
Alpuente, María ; Romero, Daniel
Author_Institution
DSIC-ELP, Univ. Politec. de Valencia, Valencia, Spain
fYear
2010
fDate
19-23 July 2010
Firstpage
45
Lastpage
51
Abstract
Recently, we proposed a functional technique for identifying similar Web pages that is based on measuring tree similarity. The key idea behind the method is to transform each Web page into a compressed, normalized tree that effectively represents its visual structure. In this work, we develop an optimization of this technique that is based on memoization and that achieves significant improvements in efficiency in both time and space. This work also presents a tool that implements the proposed technique as well as two case studies for two real scenarios. Experiments on real documents show that the optimized algorithm performs significantly better than the original technique and demonstrate the practicality of our approach.
Keywords
Internet; Web sites; data visualisation; optimisation; Web pages; memoization; optimization; tree similarity; visual similarity; Algebra; Complexity theory; HTML; Optimization; Visualization; Web pages; XML; Web document clustering; Web page comparison; tree edit distance; visual similarity;
fLanguage
English
Publisher
ieee
Conference_Titel
Applications and the Internet (SAINT), 2010 10th IEEE/IPSJ International Symposium on
Conference_Location
Seoul
Print_ISBN
978-1-4244-7526-1
Electronic_ISBN
978-0-7695-4107-5
Type
conf
DOI
10.1109/SAINT.2010.17
Filename
5598174
Link To Document