DocumentCode :
2450088
Title :
Designing topology-aware collective communication algorithms for large scale InfiniBand clusters: Case studies with Scatter and Gather
Author :
Kandalla, Krishna ; Subramoni, Hari ; Vishnu, Abhinav ; Panda, Dhabaleswar K.
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
fYear :
2010
fDate :
19-23 April 2010
Firstpage :
1
Lastpage :
8
Abstract :
Modern high performance computing systems are being increasingly deployed in a hierarchical fashion with multi-core computing platforms forming the base of the hierarchy. These systems are usually comprised of multiple racks, with each rack consisting of a finite number of chassis, and each chassis having multiple compute nodes or blades, based on multi-core architectures. The networks are also hierarchical with multiple levels of switches. Message exchange operations between processes that belong to different racks involve multiple hops across different switches and this directly affects the performance of collective operations. In this paper, we take on the challenges involved in detecting the topology of large scale InfiniBand clusters and leveraging this knowledge to design efficient topology-aware algorithms for collective operations. We also propose a communication model to analyze the communication costs involved in collective operations on large scale supercomputing systems. We have analyzed the performance characteristics of two collectives, MPI_Gather and MPI_Scatter, on such systems and we have proposed topology-aware algorithms for these operations. Our experimental results have shown that the proposed algorithms can improve the performance of these collective operations by almost 54% at the micro-benchmark level.
Keywords :
multiprocessing systems; parallel machines; workstation clusters; MPI_Gather; MPI_Scatter; high performance computing systems; large scale InfiniBand clusters; large scale supercomputing systems; multicore architecture; multicore computing platform; topology-aware algorithm; topology-aware collective communication algorithm; Algorithm design and analysis; Blades; Clustering algorithms; Communication switching; Computer architecture; High performance computing; Large-scale systems; Network topology; Scattering; Switches;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4244-6533-0
Type :
conf
DOI :
10.1109/IPDPSW.2010.5470853
Filename :
5470853
Link To Document :
بازگشت