DocumentCode :
1745835
Title :
Correlation-based document clustering using web logs
Author :
Su, Zhong ; Qiang Yang ; Zhang, Hongjiang ; Xu, Xiaowei ; Hu, Yuhen
Author_Institution :
Dept. of Comput. Sci., Tsinghua Univ., Beijing, China
fYear :
2001
fDate :
6-6 Jan. 2001
Abstract :
A problem facing information retrieval on the web is how to effectively cluster large amounts of web documents. One approach is to cluster the documents based on information provided only by users´ usage logs and not by the content of the documents. A major advantage of this approach is that the relevancy information is objectively reflected by the usage logs; frequent simultaneous visits to two seemingly unrelated documents should indicate that they are in fact closely related. In this paper, we present a recursive density based clustering algorithm that can adaptively change its parameters intelligently. Our clustering algorithm RDBC (Recursive Density Based Clustering algorithm) is based on DBSCAN, a density based algorithm that has been proven in its ability in processing very large datasets. The fact that DBSCAN does not require the pre-determination of the number of clusters and is linear in time complexity makes it particularly attractive in web page clustering. It can be shown that RDBC require the same time complexity as that of the DBSCAN algorithm. In addition, we prove both analytically and experimentally that our method yields clustering results that are superior to that of DBSCAN.
Keywords :
Internet; computational complexity; information resources; information retrieval; DBSCAN; clustering algorithm RDBC; correlation-based document clustering; density based algorithm; information retrieval; recursive density based clustering algorithm; relevancy information; time complexity; very large datasets; web documents; web logs; web page clustering; Clustering algorithms; Clustering methods; Information retrieval; Scattering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
System Sciences, 2001. Proceedings of the 34th Annual Hawaii International Conference on
Conference_Location :
Maui, HI, USA
Print_ISBN :
0-7695-0981-9
Type :
conf
DOI :
10.1109/HICSS.2001.926536
Filename :
926536
Link To Document :
بازگشت