DocumentCode :
2252137
Title :
Scalable clustering: a distributed approach
Author :
More, P. ; Hall, Lawrence O.
Author_Institution :
Dept. of Comput. Sci. & Eng., South Florida Univ., Tampa, FL, USA
Volume :
1
fYear :
2004
fDate :
25-29 July 2004
Firstpage :
143
Abstract :
The ever-increasing size of data sets and poor scalability of clustering algorithms has drawn attention to distributed clustering for partitioning large data sets. In this paper we propose an algorithm to cluster large-scale data sets without clustering all the data at a time. Data is randomly divided into almost equal size disjoint subsets. We then cluster each subset using the hard-k means or fuzzy k-means algorithm. The centroids of subsets form an ensemble. A centroid correspondence algorithm transitively solves the correspondence problem among the ensemble of centroids. The centroids are combined to form a global set of centroids. Experimental results show that most of the time the pattern of clusters generated by our algorithm is similar to the pattern of clusters generated by clustering all the data at a time. We have shown that the disputed examples between the clusters generated by our algorithm and clustering all the data at a time lay on the spatial border of clusters.
Keywords :
fuzzy set theory; pattern clustering; very large databases; fuzzy k-means algorithm; large-scale data sets; scalable clustering; Clustering algorithms; Computer science; Data engineering; Euclidean distance; Fuzzy logic; Fuzzy sets; Iterative algorithms; Partitioning algorithms; Scalability; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Systems, 2004. Proceedings. 2004 IEEE International Conference on
ISSN :
1098-7584
Print_ISBN :
0-7803-8353-2
Type :
conf
DOI :
10.1109/FUZZY.2004.1375705
Filename :
1375705
Link To Document :
بازگشت