DocumentCode :
2410658
Title :
Research on Clustering Algorithm and Its Parallelization Strategy
Author :
Li, Lingjuan ; Xi, Yang
fYear :
2011
fDate :
21-23 Oct. 2011
Firstpage :
325
Lastpage :
328
Abstract :
As a hot topic of recent study, clouding computing can help us to analyze and process massive data effectively. Clustering is one of the important tasks of data mining. This paper focuses on how to improve the performance of clustering algorithm on massive data. A hierarchical-based DBSCAN algorithm (named HDBSCAN) is proposed by improving the existing density-based clustering algorithm DBSCAN, and the parallel execution strategies of the HDBSCAN algorithm on Map Reduce of cloud computing is designed. The experiment to test the performance of HDBSCAN is done on Hadoop which is a cloud computing platform. The experimental result shows that HDBSCAN can effectively improve the efficiency of clustering massive data.
Keywords :
Algorithm design and analysis; Cloud computing; Clustering algorithms; Data mining; Educational institutions; Noise; Software algorithms; MapReduce; cloud computing; density-based clustering; hierarchical clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational and Information Sciences (ICCIS), 2011 International Conference on
Conference_Location :
Chengdu, China
Print_ISBN :
978-1-4577-1540-2
Type :
conf
DOI :
10.1109/ICCIS.2011.223
Filename :
6086201
Link To Document :
بازگشت