Title :
An enhanced agglomerative fuzzy k-means clustering method with mapreduce implementation on Hadoop platform
Author :
Ruixin Zhang ; Yinglin Wang
Author_Institution :
Dept. of Comput. Sci. & Eng., Shanghai Jiaotong Univ., Shanghai, China
Abstract :
In this Paper, an enhanced agglomerative fuzzy K-Means clustering algorithm with the MapReduce implementation is proposed. In this algorithm, an initial center selection method is introduced to improve the accuracy and increase the convergence speed of the agglomerative fuzzy k-means algorithm. Then, a MapReduce implementation based on Apache Hadoop is presented to increase the scalability for large scale datasets. Experiments were respectively conducted on a synthetic data set, the WINE dataset from UCI Repository and a randomly generated large dataset. The experimental results show that the proposed algorithm can identify true cluster number and produce accurate result with good scalability on large dataset.
Keywords :
distributed programming; fuzzy set theory; pattern clustering; public domain software; Apache Hadoop; Hadoop platform; MapReduce; UCI Repository; WINE dataset; convergence speed; enhanced agglomerative fuzzy k-means clustering method; initial center selection method; large scale datasets; randomly generated large dataset; synthetic data set; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Indexes; Machine learning algorithms; Scalability; Signal processing algorithms; MapReduce; agglomerative fuzzy kmeans; initial center selection; number of clusters;
Conference_Titel :
Progress in Informatics and Computing (PIC), 2014 International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4799-2033-4
DOI :
10.1109/PIC.2014.6972387