DocumentCode :
3007099
Title :
Towards automatic clustering of similar pages in web applications
Author :
De Lucia, Andrea ; Risi, Michele ; Tortora, Genoveffa ; Scanniello, Giuseppe
Author_Institution :
Dipt. di Mat. e Inf., Univ. of Salerno, Fisciano, Italy
fYear :
2009
fDate :
25-26 Sept. 2009
Firstpage :
99
Lastpage :
108
Abstract :
In this paper, we propose an automatic approach to group web pages that are similar at the content level. The approach uses the Levenshtein string edit distance and Latent Semantic Indexing to compute page dissimilarity and then groups them using iteratively a Graph-Theoretic clustering algorithm. To automate the clustering process a prototype has been implemented and used to assess the proposed approach on three web sites.
Keywords :
Web sites; content-based retrieval; graph theory; indexing; pattern clustering; string matching; Web site; graph theoretic clustering algorithm; group Web page; latent semantic indexing; levenshtein string edit distance; Atmospheric measurements; Clustering algorithms; Navigation; Particle measurements; Prototypes; Web sites; Weight measurement;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Systems Evolution (WSE), 2009 11th IEEE International Symposium on
Conference_Location :
Edmonton, AB
ISSN :
1550-4441
Print_ISBN :
978-1-4244-5124-1
Type :
conf
DOI :
10.1109/WSE.2009.5631253
Filename :
5631253
Link To Document :
بازگشت