DocumentCode :
2191037
Title :
Distributed, Scalable Clustering for Detecting Halos in Terascale Astronomy Datasets
Author :
Daruru, Srivatsava ; Dhandapani, Sankari ; Gupta, Gunjan ; Iliev, Ilian ; Xu, Weijia ; Navratil, Paul ; Marín, Nena ; Ghosh, Joydeep
Author_Institution :
Dept. of Comput. Sci., Univ. of Texas at Austin, Austin, TX, USA
fYear :
2010
fDate :
13-13 Dec. 2010
Firstpage :
138
Lastpage :
147
Abstract :
Terascale astronomical datasets have the potential to provide unprecedented insights into the origins of our universe. However, automated techniques for determining regions of interest are a must if domain experts are to cope with the intractable amounts of simulation data. This paper addresses the important problem of locating and tracking high density regions in space that generally correspond to halos and sub-halos and host galaxies. A density based, mode following clustering method called Automated Hierarchical Density Shaving (Auto-HDS) is adapted for this application. Auto-HDS can detect clusters of different densities while discarding the vast majority of background data. Two alternative parallel implementations of the algorithm, based respectively on the dataflow computational model and on Hadoop/ MapReduce functional programming constructs, are realized and compared. Based on runtime performance, scalability across compute cores and across increasing data volumes, we demonstrate the benefits of fine grain parallelism. The proposed distributed and multithreaded AutoHDS clustering algorithm is shown to produce high quality clusters, be computationally efficient, and scalable from 1 through 1024 compute-cores.
Keywords :
astronomy computing; data flow computing; functional programming; multi-threading; pattern clustering; Hadoop; MapReduce; automated hierarchical density shaving; dataflow computational model; distributed AutoHDS clustering algorithm; fine grain parallelism; functional programming; halos detection; high density regions; host galaxies; multithreaded AutoHDS clustering algorithm; scalable clustering; terascale astronomy datasets; Astronomy; Distributed Clustering; Scalable; Terascale;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops (ICDMW), 2010 IEEE International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-1-4244-9244-2
Electronic_ISBN :
978-0-7695-4257-7
Type :
conf
DOI :
10.1109/ICDMW.2010.26
Filename :
5693293
Link To Document :
بازگشت