Title :
A Tool for Computing the Visual Similarity of Web Pages
Author :
Alpuente, María ; Romero, Daniel
Author_Institution :
DSIC-ELP, Univ. Politec. de Valencia, Valencia, Spain
Abstract :
Recently, we proposed a functional technique for identifying similar Web pages that is based on measuring tree similarity. The key idea behind the method is to transform each Web page into a compressed, normalized tree that effectively represents its visual structure. In this work, we develop an optimization of this technique that is based on memoization and that achieves significant improvements in efficiency in both time and space. This work also presents a tool that implements the proposed technique as well as two case studies for two real scenarios. Experiments on real documents show that the optimized algorithm performs significantly better than the original technique and demonstrate the practicality of our approach.
Keywords :
Internet; Web sites; data visualisation; optimisation; Web pages; memoization; optimization; tree similarity; visual similarity; Algebra; Complexity theory; HTML; Optimization; Visualization; Web pages; XML; Web document clustering; Web page comparison; tree edit distance; visual similarity;
Conference_Titel :
Applications and the Internet (SAINT), 2010 10th IEEE/IPSJ International Symposium on
Conference_Location :
Seoul
Print_ISBN :
978-1-4244-7526-1
Electronic_ISBN :
978-0-7695-4107-5
DOI :
10.1109/SAINT.2010.17