DocumentCode :
2536030
Title :
Improving Application Performance and Predictability Using Multiple Virtual Lanes in Modern Multi-core InfiniBand Clusters
Author :
Subramoni, H. ; Ping Lai ; Sur, S. ; Panda, D.K.
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
fYear :
2010
fDate :
13-16 Sept. 2010
Firstpage :
462
Lastpage :
471
Abstract :
Network congestion is an important factor affecting the performance of large scale jobs in supercomputing clusters, especially with the wide deployment of multi-core processors. The blocking nature of current day collectives makes such congestion a critical factor in their performance. On the other hand, modern interconnects like InfiniBand provide us with many novel features such as Virtual Lanes aimed at delivering better performance to end applications. Theoretical research in the field of network congestion indicate Head of Line (HoL) blocking as a common causes for congestion and the use of multiple virtual lanes as one of the ways to alleviate it. In this context, we make use of the multiple virtual lanes provided by the InfiniBand standard as a means to alleviate network congestion and thereby improve the performance of various high performance computing applications on modern multi-core clusters. We integrate our scheme into the MVAPICH2 MPI library. To the best of our knowledge, this is the first such implementation that takes advantage of the use of multiple virtual lanes at the MPI level. We perform various experiments at native InfiniBand, microbenchmark as well as at the application levels. The results of our experimental evaluation show that the use of multiple virtual lanes can improve the predictability of message arrival by up to 10 times in the presence of network congestion. Our microbenchmark level evaluation with multiple communication streams show that the use of multiple virtual lanes can improve the bandwidth / latency / message rate of medium sized messages by up to 13%. Through the use of multiple virtual lanes, we are also able to improve the performance of the Alltoall collective operation for medium message sizes by up to 20%. Performance improvement of up to 12% is also observed for Alltoall collective operation through segregation of traffic into multiple virtual lanes when multiple jobs compete for the same network resource. We also - ee that our scheme can improve the performance of collective operations used inside the CPMD application by 11% and the overall performance of the CPMD application itself by up to 6%.
Keywords :
mainframes; microprocessor chips; parallel machines; telecommunication congestion control; virtual reality; workstation clusters; Alltoall collective operation; CPMD application; HoL blocking; MVAPICH2 MPI library; head of line blocking; message rate; microbenchmark level evaluation; multicore infiniband clusters; multicore processors; multiple communication streams; multiple virtual lanes; network congestion; supercomputing clusters; Bandwidth; Benchmark testing; Color; Delay; Fabrics; Quality of service; Time frequency analysis; High Performance Computing; InfiniBand; MPI; QoS in InfiniBand; Virtual Lanes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing (ICPP), 2010 39th International Conference on
Conference_Location :
San Diego, CA
ISSN :
0190-3918
Print_ISBN :
978-1-4244-7913-9
Type :
conf
DOI :
10.1109/ICPP.2010.54
Filename :
5599188
Link To Document :
بازگشت