Title :
Parallel K-Medoids clustering algorithm based on Hadoop
Author :
Yaobin Jiang ; Jiongmin Zhang
Author_Institution :
Dept. of Comput. Sci. & Technol., East China Normal Univ., Shanghai, China
Abstract :
The K-Medoids clustering algorithm solves the problem of the K-Means algorithm on processing the outlier samples, but it is not be able to process big-data because of the time complexity[1]. MapReduce is a parallel programming model for processing big-data, and has been implemented in Hadoop. In order to break the big-data limits, the parallel K-Medoids algorithm HK-Medoids based on Hadoop was proposed. Every submitted job has many iterative MapReduce procedures: In the map phase, each sample was assigned to one cluster whose center is the most similar with the sample; in the combine phase, an intermediate center for each cluster was calculated; and in the reduce phase, the new center was calculated. The iterator stops when the new center is similar to the old one. The experimental results showed that HK-Medoids algorithm has a good clustering result and linear speedup for big-data.
Keywords :
Big Data; computational complexity; iterative methods; parallel programming; pattern clustering; HK-medoids; Hadoop; big data processing; iterative MapReduce procedures; k-means algorithm; map phase; outlier sample processing; parallel k-medoids clustering algorithm; parallel programming model; time complexity; Algorithm design and analysis; Clustering algorithms; Computational modeling; Educational institutions; Indexes; Partitioning algorithms; Programming; Big-Data; Clustering Analysis; Hadoop; K-Medoids; MapReduce;
Conference_Titel :
Software Engineering and Service Science (ICSESS), 2014 5th IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4799-3278-8
DOI :
10.1109/ICSESS.2014.6933652