DocumentCode :
2540435
Title :
Efficient Barrier and Allreduce on Infiniband clusters using multicast and adaptive algorithms
Author :
Mamidala, Amith R. ; Liu, Jiuxing ; Panda, Dhabaleswar K.
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
fYear :
2004
fDate :
20-23 Sept. 2004
Firstpage :
135
Lastpage :
144
Abstract :
Popular algorithms proposed in the literature for doing Barrier and Allreduce in clusters, such as pair-wise exchange, dissemination and gather-broadcast do not give an optimal performance when there is skew among the nodes in the cluster. In pair-wise exchange and dissemination, all the nodes must arrive for the completion of each step. The gather-broadcast algorithm assumes a fixed tree topology. We propose to use hardware multicast of InfiniBand in the design of an adaptive algorithm that performs well in the presence of skew. In this approach, the topology of the tree is not fixed but adapts depending on the skew. The last arriving node becomes the root of the tree if the skew is sufficiently large. We have carried out in-depth evaluation of our scheme and use synchronization delay as the performance metric for Barrier and Allreduce in the presence of skew. Our performance evaluation shows that our design scales very well with system size. Our designs can reduce the synchronization delay by a factor of 2.28 for Barrier and by a factor of 2.18 in the case of Allreduce. We have examined different skew scenarios and showed that the adaptive design performs either better or comparably to the existing schemes.
Keywords :
message passing; multicast communication; synchronisation; telecommunication network topology; workstation clusters; Allreduce; Barrier; Infiniband clusters; adaptive algorithms; adaptive design; fixed tree topology; hardware multicast; multicast algorithms; synchronization delay; Adaptive algorithm; Algorithm design and analysis; Clustering algorithms; Computer science; Delay; Hardware; Measurement; Multicast algorithms; Personal communication networks; Topology;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing, 2004 IEEE International Conference on
ISSN :
1552-5244
Print_ISBN :
0-7803-8694-9
Type :
conf
DOI :
10.1109/CLUSTR.2004.1392611
Filename :
1392611
Link To Document :
بازگشت