DocumentCode :
598567
Title :
Billion-particle SIMD-friendly two-point correlation on large-scale HPC cluster systems
Author :
Chhugani, J. ; Changkyu Kim ; Shukla, Himanshu ; Park, Jongho ; Dubey, Pradeep ; Shalf, J. ; Simon, Horst D.
fYear :
2012
fDate :
10-16 Nov. 2012
Firstpage :
1
Lastpage :
11
Abstract :
Two-point Correlation Function (TPCF) is widely used in astronomy to characterize the distribution of matter/energy in the Universe, and help derive the physics that can trace back to the creation of the universe. However, it is prohibitively slow for current sized datasets, and would continue to be a critical bottleneck with the trend of increasing dataset sizes to billions of particles and more, which makes TPCF a compelling benchmark application for future exa-scale architectures. State-of-the-art TPCF implementations do not map well to the underlying SIMD hardware, and also suffer from load-imbalance for large core counts. In this paper, we present a novel SIMD-friendly histogram update algorithm that exploits the spatial locality of histogram updates to achieve near-linear SIMD scaling. We also present a load-balancing scheme that combines domain-specific initial static division of work and dynamic task migration across nodes to effectively balance computation across nodes. Using Zin supercomputer at Lawrence Livermore National Laboratory (25,600 cores of Intel® Xeon® E5-2670, each with 256-bit SIMD), we achieve 90% parallel efficiency and 96% SIMD efficiency, and perform TPCF computation on a 1.7 billion particle dataset in 5.3 hours (at least 35× faster than previous approaches). In terms of cost per performance (measured in flops/$), we achieve at least an order-of-magnitude (11.1x) higher flops/$ as compared to the best known results [1]. Consequently, we now have line-of-sight to achieving the processing power for correlation computation to process billion+ particles telescopic data.
Keywords :
astronomy computing; benchmark testing; correlation methods; mainframes; parallel architectures; performance evaluation; resource allocation; Lawrence Livermore National Laboratory; SIMD efficiency; SIMD hardware; SIMD-friendly histogram update algorithm; TPCF computation; Universe; Zin supercomputer; astronomy; billion-particle SIMD-friendly two-point correlation function; billion-particle telescopic data processing; correlation computation; domain-specific initial static work division; dynamic task migration; energy distribution; exa-scale architectures; large-scale HPC cluster systems; load-balancing scheme; load-imbalance; near-linear SIMD scaling; parallel efficiency; spatial histogram update locality; state-of-the-art TPCF implementations; Acceleration; Clustering algorithms; Computer architecture; Correlation; Histograms; Instruction sets;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SC), 2012 International Conference for
Conference_Location :
Salt Lake City, UT
ISSN :
2167-4329
Print_ISBN :
978-1-4673-0805-2
Type :
conf
DOI :
10.1109/SC.2012.24
Filename :
6468441
Link To Document :
بازگشت