• DocumentCode
    2520299
  • Title

    A Tool for Computing the Visual Similarity of Web Pages

  • Author

    Alpuente, María ; Romero, Daniel

  • Author_Institution
    DSIC-ELP, Univ. Politec. de Valencia, Valencia, Spain
  • fYear
    2010
  • fDate
    19-23 July 2010
  • Firstpage
    45
  • Lastpage
    51
  • Abstract
    Recently, we proposed a functional technique for identifying similar Web pages that is based on measuring tree similarity. The key idea behind the method is to transform each Web page into a compressed, normalized tree that effectively represents its visual structure. In this work, we develop an optimization of this technique that is based on memoization and that achieves significant improvements in efficiency in both time and space. This work also presents a tool that implements the proposed technique as well as two case studies for two real scenarios. Experiments on real documents show that the optimized algorithm performs significantly better than the original technique and demonstrate the practicality of our approach.
  • Keywords
    Internet; Web sites; data visualisation; optimisation; Web pages; memoization; optimization; tree similarity; visual similarity; Algebra; Complexity theory; HTML; Optimization; Visualization; Web pages; XML; Web document clustering; Web page comparison; tree edit distance; visual similarity;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Applications and the Internet (SAINT), 2010 10th IEEE/IPSJ International Symposium on
  • Conference_Location
    Seoul
  • Print_ISBN
    978-1-4244-7526-1
  • Electronic_ISBN
    978-0-7695-4107-5
  • Type

    conf

  • DOI
    10.1109/SAINT.2010.17
  • Filename
    5598174