DocumentCode :
649462
Title :
A scalable algorithm for single-linkage hierarchical clustering on distributed-memory architectures
Author :
Hendrix, William ; Palsetia, Diana ; Ali Patwary, Md Mostofa ; Agrawal, Ankit ; Wei-keng Liao ; Choudhary, Alok
fYear :
2013
fDate :
13-14 Oct. 2013
Firstpage :
7
Lastpage :
13
Abstract :
Hierarchical clustering is a fundamental and widely-used clustering algorithm with many advantages over traditional partitional clustering. Due to the explosion in size of modern scientific datasets, there is a pressing need for scalable analytics algorithms, but good scaling is difficult to achieve for hierarchical clustering due to data dependencies inherent in the algorithm. To the best of our knowledge, no previous work on parallel hierarchical clustering has shown scalability beyond a couple hundred processes. In this paper, we present PINK, a scalable parallel algorithm for single-linkage hierarchical clustering based on decomposing a problem instance into two different types of subproblems. Despite the heterogeneous workloads, our algorithm exhibits good load balancing, as well as low memory requirements and a communication pattern that is both low-volume and deterministic. Evaluating PINK on up to 6050 processes, we find that it achieves speedups up to approximately 6600.
Keywords :
distributed memory systems; memory architecture; parallel algorithms; parallel architectures; pattern clustering; distributed memory architectures; parallel hierarchical clustering; partitional clustering; scalability; scalable analytics algorithms; scalable parallel algorithm; single linkage hierarchical clustering algorithm;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Large-Scale Data Analysis and Visualization (LDAV), 2013 IEEE Symposium on
Conference_Location :
Atlanta, GA
Type :
conf
DOI :
10.1109/LDAV.2013.6675153
Filename :
6675153
Link To Document :
بازگشت