DocumentCode :
2938169
Title :
On the Performance of Distributed Clustering Algorithms in File and Streaming Processing Systems
Author :
Ericson, Kathleen ; Pallickara, Shrideep
Author_Institution :
Comput. Sci. Dept., Colorado State Univ., Fort Collins, CO, USA
fYear :
2011
fDate :
5-8 Dec. 2011
Firstpage :
33
Lastpage :
40
Abstract :
There is often a need to cluster voluminous amounts of data. Such clustering has application in fields such as pattern recognition, data mining, bioinformatics, and recommendation systems. Here we evaluate the performance of 4 clustering algorithms viz. K-means, Fuzzy k-means, Dirichlet, and Latent Dirichlet Allocation within two different cloud runtimes: Hadoop and Granules. Our benchmarks use identical clustering code with both Hadoop and Granules. The difference between these implementations stem from how the Hadoop and Granules runtimes (1) support and manage the lifecycle of individual computations, and (2) how they orchestrate exchange of data between different stages of the computational pipeline during successive iterations of the clustering algorithm. We also include an analysis of our results for each of these clustering algorithms in a distributed setting.
Keywords :
distributed processing; file organisation; fuzzy set theory; learning (artificial intelligence); pattern clustering; software performance evaluation; Dirichlet clustering algorithms; Granules runtime; Hadoop runtime; K-means clustering; distributed data clustering algorithm performance evaluation; file processing system; fuzzy k-means clustering; latent Dirichlet allocation algorithm; machine learning; streaming system; Algorithm design and analysis; Clustering algorithms; Distributed databases; Pipelines; Processor scheduling; Runtime; Semantics; Clustering; Distributed Stream Processing; Granules; Hadoop; Machine Learning; Mahout;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Utility and Cloud Computing (UCC), 2011 Fourth IEEE International Conference on
Conference_Location :
Victoria, NSW
Print_ISBN :
978-1-4577-2116-8
Type :
conf
DOI :
10.1109/UCC.2011.15
Filename :
6123478
Link To Document :
بازگشت