مرکز منطقه ای اطلاع رساني علوم و فناوري - Scalable clustering: a distributed approach

DocumentCode :

2252137

Title :

Scalable clustering: a distributed approach

Author :

More, P. ; Hall, Lawrence O.

Author_Institution :

Dept. of Comput. Sci. & Eng., South Florida Univ., Tampa, FL, USA

Volume :

fYear :

2004

fDate :

25-29 July 2004

Firstpage :

143

Abstract :

The ever-increasing size of data sets and poor scalability of clustering algorithms has drawn attention to distributed clustering for partitioning large data sets. In this paper we propose an algorithm to cluster large-scale data sets without clustering all the data at a time. Data is randomly divided into almost equal size disjoint subsets. We then cluster each subset using the hard-k means or fuzzy k-means algorithm. The centroids of subsets form an ensemble. A centroid correspondence algorithm transitively solves the correspondence problem among the ensemble of centroids. The centroids are combined to form a global set of centroids. Experimental results show that most of the time the pattern of clusters generated by our algorithm is similar to the pattern of clusters generated by clustering all the data at a time. We have shown that the disputed examples between the clusters generated by our algorithm and clustering all the data at a time lay on the spatial border of clusters.

Keywords :

fuzzy set theory; pattern clustering; very large databases; fuzzy k-means algorithm; large-scale data sets; scalable clustering; Clustering algorithms; Computer science; Data engineering; Euclidean distance; Fuzzy logic; Fuzzy sets; Iterative algorithms; Partitioning algorithms; Scalability; Testing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Fuzzy Systems, 2004. Proceedings. 2004 IEEE International Conference on

ISSN :

1098-7584

Print_ISBN :

0-7803-8353-2

Type :

conf

DOI :

10.1109/FUZZY.2004.1375705

Filename :

1375705

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2252137