DocumentCode :
3428214
Title :
Large-scale energy-efficient graph traversal: A path to efficient data-intensive supercomputing
Author :
Satish, Nadathur ; Changkyu Kim ; Chhugani, J. ; Dubey, Pradeep
fYear :
2012
fDate :
10-16 Nov. 2012
Firstpage :
1
Lastpage :
11
Abstract :
Graph traversal is a widely used algorithm in a variety of fields, including social networks, business analytics, and high-performance computing among others. There has been a push for HPC machines to be rated not just in Petaflops, but also in "GigaTEPS" (billions of traversed edges per second), and the Graph500 benchmark has been established for this purpose. Graph traversal on single nodes has been well studied and optimized on modern CPU architectures. However, current cluster implementations suffer from high latency data communication with large volumes of transfers across nodes, leading to inefficiency in performance and energy consumption. In this work, we show that we can overcome these constraints using a combination of efficient low-overhead data compression techniques to reduce transfer volumes along with latency-hiding techniques. Using an optimized single node graph traversal algorithm [1], our novel cluster optimizations result in over 6.6X performance improvements over state-of-the-art data transfer techniques, and almost an order of magnitude in energy savings. Our resulting implementation of the Graph500 benchmark achieves 115 GigaTEPS on a 320-node/5120 core Intel® Endeavor cluster with Intel® Xeon® processors E5-2670, which matches the second ranked result in the recent November 2011 Graph500 list [2] with about 5.6X fewer nodes. Our cluster optimizations only have a 1.8X overhead in overall performance from the performance of the optimized single-node implementation, and allows for near-linear scaling with number of nodes. Our algorithm on 1024 nodes on Intel® Xeon® processor X5670-based systems (with lower per-node performance) for a large multi-Terabyte graph attained 195 GigaTEPS in performance, proving the high scalability of our algorithm. Our per-node performance is the highest in the top 10 of the Nov 2011 Graph500 list.
Keywords :
data compression; graph theory; microprocessor chips; parallel architectures; parallel machines; 320-node-5120 core Intel Endeavor cluster; CPU architectures; GigaTEPS; Graph500 benchmark; HPC machines; Intel Xeon processors E5-2670; business analytics; cluster implementations; efficient data-intensive supercomputing; high latency data communication; high-performance computing; large-scale energy-efficient graph traversal; latency-hiding techniques; low-overhead data compression techniques; multiterabyte graph; optimized single node graph traversal algorithm; optimized single-node implementation; petaflops; social networks; state-of-the-art data transfer techniques; Arrays; Bandwidth; Benchmark testing; Data transfer; Energy consumption; Optimization; Program processors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SC), 2012 International Conference for
Conference_Location :
Salt Lake City, UT
ISSN :
2167-4329
Print_ISBN :
978-1-4673-0805-2
Type :
conf
DOI :
10.1109/SC.2012.70
Filename :
6468457
Link To Document :
بازگشت