DocumentCode :
3107554
Title :
CURE-NS: a hierarchical clustering algorithm with new shrinking scheme
Author :
Qian, Yun-tao ; Shi, Qing-Song ; Wang, Qi
Author_Institution :
Sch. of Comput. Sci. & Technol., Zhejiang Univ., Hangzhou, China
Volume :
2
fYear :
2002
fDate :
2002
Firstpage :
895
Abstract :
CURE (clustering using representatives) is an efficient clustering algorithm for large databases, which is more robust to outliers compared with other clustering methods, and identifies clusters having non-spherical shapes and wide variances in size. CURE employs a fixed number or representative points to describe the cluster, and the set of representative points are first chosen randomly, and then are shrunk toward the mean of cluster. The shrinking operation plays a key role in CURE, which is used for weakening the effect of outliers. However, we found that the shrinking scheme of CURE is dependent on a hidden assumption of spherical shape of cluster, therefore CURE has difficulties in dealing with databases having specific shapes. In this paper, CURE-NS (CURE with new shrinking scheme) is proposed to overcome this problem, which uses the difference of density values of the representative points to determine the direction and distance of shrinking. Our shrinking scheme has nothing to do with the shape of cluster. A range of experiments demonstrate that CURE-NS has better clustering performance than CURE.
Keywords :
Gaussian distribution; data mining; database management systems; pattern clustering; Gaussian density distributions; clustering with representatives; data analysis; data mining; density distribution; hierarchical clustering; large databases; outliers; shrinking operation; Clustering algorithms; Clustering methods; Computer science; Distributed computing; Noise shaping; Robustness; Scattering; Shape; Spatial databases; Working environment noise;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference on
Print_ISBN :
0-7803-7508-4
Type :
conf
DOI :
10.1109/ICMLC.2002.1174512
Filename :
1174512
Link To Document :
بازگشت