Title :
Towards automatic clustering of similar pages in web applications
Author :
De Lucia, Andrea ; Risi, Michele ; Tortora, Genoveffa ; Scanniello, Giuseppe
Author_Institution :
Dipt. di Mat. e Inf., Univ. of Salerno, Fisciano, Italy
Abstract :
In this paper, we propose an automatic approach to group web pages that are similar at the content level. The approach uses the Levenshtein string edit distance and Latent Semantic Indexing to compute page dissimilarity and then groups them using iteratively a Graph-Theoretic clustering algorithm. To automate the clustering process a prototype has been implemented and used to assess the proposed approach on three web sites.
Keywords :
Web sites; content-based retrieval; graph theory; indexing; pattern clustering; string matching; Web site; graph theoretic clustering algorithm; group Web page; latent semantic indexing; levenshtein string edit distance; Atmospheric measurements; Clustering algorithms; Navigation; Particle measurements; Prototypes; Web sites; Weight measurement;
Conference_Titel :
Web Systems Evolution (WSE), 2009 11th IEEE International Symposium on
Conference_Location :
Edmonton, AB
Print_ISBN :
978-1-4244-5124-1
DOI :
10.1109/WSE.2009.5631253