DocumentCode :
3660556
Title :
An optimal distributed K-Means clustering algorithm based on cloudstack
Author :
Yingchi Mao;Ziyang Xu;Xiaofang Li;Ping Ping
Author_Institution :
College of Computer and Information Engineering, Hohai University, Nanjing, Jiangsu Province, China
fYear :
2015
Firstpage :
3149
Lastpage :
3156
Abstract :
Clustering algorithm is applied to all kinds of fields, especially in the field of data mining. Due to the increasing number of the data, it´s too hard for the clustering algorithm to afford the computation time in traditional computing model. When handling with big data, the corresponding algorithms of data mining have been transformed from the original single-core or single ported into the parallel and distributed processing. Parallel processing becomes the most popular way to improve the execution performance. This paper established a Hadoop distributed cluster based on the CloudStack and implemented the optimal distributed K-Means clustering algorithm based on MapReduce. The proposed optimal distributed K-Means clustering can obtain good quality of the results and the efficiency of the execution time. The experiment results show that the optimal distributed K-Means cluster algorithm can have better performance for dealing with large-scale data set.
Keywords :
"Clustering algorithms","Algorithm design and analysis","Computational modeling","Distributed databases","Complexity theory","Virtual machining","Data mining"
Publisher :
ieee
Conference_Titel :
Information and Automation, 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/ICInfA.2015.7279830
Filename :
7279830
Link To Document :
بازگشت