Title :
Scalable Fast Evolutionary k-Means Clustering
Author :
Gilberto Viana de Oliveira;Murilo Coelho Naldi
Author_Institution :
Dept. de Inf., Univ. Fed. de Vicosa, Vicosa, Brazil
Abstract :
The increasing amount of data requires greater scalability for clustering algorithms. The intrinsic parallelism of the MapReduce model confers management and reliability to large-scale distributed operations. However, its restrictions hinder the direct application of several traditional clustering algorithms. K-means is one of the few clustering algorithms that satisfy the MapReduce constraints, but it requires the prior specification of the number of clusters and is sensitive to their initialization. This paper proposes a MapReduce algorithm able to evolve clusters with no need to specify k-means´ parameters. Through evolutive operators, obtained clusters are used to search for better solutions, allowing the algorithm to find quality solutions quickly. The algorithm is compared with state-of-the-art MapReduce versions of a systematic algorithm which is able to find the number of kmeans clusters and initializations. Computational experiments and statistical analyses of the results indicate that the proposed algorithm is able to obtain clusters with quality equal or superior to clusters of the compared algorithm, but faster.
Keywords :
"Clustering algorithms","Partitioning algorithms","Algorithm design and analysis","Prototypes","Data models","Sociology","Statistics"
Conference_Titel :
Intelligent Systems (BRACIS), 2015 Brazilian Conference on
DOI :
10.1109/BRACIS.2015.20