DocumentCode :
598597
Title :
A new scalable parallel DBSCAN algorithm using the disjoint-set data structure
Author :
Patwary, M. Mostofa Ali ; Palsetia, Diana ; Agrawal, Ankit ; Wei-keng Liao ; Manne, Fredrik ; Choudhary, Alok
Author_Institution :
Northwestern Univ., Evanston, IL, USA
fYear :
2012
fDate :
10-16 Nov. 2012
Firstpage :
1
Lastpage :
11
Abstract :
DBSCAN is a well-known density based clustering algorithm capable of discovering arbitrary shaped clusters and eliminating noise data. However, parallelization of DBSCAN is challenging as it exhibits an inherent sequential data access order. Moreover, existing parallel implementations adopt a master-slave strategy which can easily cause an unbalanced workload and hence result in low parallel efficiency. We present a new parallel DBSCAN algorithm (PDSDBSCAN) using graph algorithmic concepts. More specifically, we employ the disjoint-set data structure to break the access sequentiality of DBSCAN. In addition, we use a tree-based bottom-up approach to construct the clusters. This yields a better-balanced workload distribution. We implement the algorithm both for shared and for distributed memory. Using data sets containing up to several hundred million high-dimensional points, we show that PDSDBSCAN significantly outperforms the master-slave approach, achieving speedups up to 25.97 using 40 cores on shared memory architecture, and speedups up to 5,765 using 8,192 cores on distributed memory architecture.
Keywords :
distributed memory systems; graph theory; pattern clustering; shared memory systems; tree data structures; PDSDBSCAN; arbitrary shaped clusters; better-balanced workload distribution; density based clustering algorithm; disjoint-set data structure; distributed memory architecture; graph algorithmic concepts; master-slave strategy; noise data elimination; scalable parallel DBSCAN algorithm; sequential data access order; shared memory architecture; tree-based bottom-up approach; unbalanced workload; Clustering algorithms; Data structures; Instruction sets; Merging; Noise; Partitioning algorithms; Vegetation; Density based clustering; Disjoint-set data structure; Union-Find algorithm;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SC), 2012 International Conference for
Conference_Location :
Salt Lake City, UT
ISSN :
2167-4329
Print_ISBN :
978-1-4673-0805-2
Type :
conf
DOI :
10.1109/SC.2012.9
Filename :
6468492
Link To Document :
بازگشت