DocumentCode :
576926
Title :
Can Network-Offload Based Non-blocking Neighborhood MPI Collectives Improve Communication Overheads of Irregular Graph Algorithms?
Author :
Kandalla, K. ; Buluç, A. ; Subramoni, H. ; Tomko, K. ; Vienne, J. ; Oliker, L. ; Panda, D.K.
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
fYear :
2012
fDate :
24-28 Sept. 2012
Firstpage :
222
Lastpage :
230
Abstract :
Graph-based computations are commonly used across various data intensive computing domains ranging from social networks to biological systems. On distributed memory systems, graph algorithms involve explicit communication between processes and often exhibit sparse, irregular behavior. Minimizing these communication overheads is critical to cater to the graph-theoretic analyses demands of emerging âbigdataâ applications. In this paper, we explore the challenges associated with reducing the communication overheads of a popular 2D Breadth First Search (BFS) implementation in the CombBLAS library. This BFS algorithm relies on two common MPI collectives, MPI Alltoallv and MPI Allgathervto exchange data between processes and they account for more than 20% of the overall run time. Re-designing parallel applications to take advantage of MPI-3 non-blocking collectives to achieve latency hiding is an active area of research. However, the 2D BFS algorithm in CombBLAS is not directly amenable for such a re-design through common overlap techniques such as, double-buffering. In this paper, we propose to re-design the BFS algorithm to leverage MPI-3 no blocking, neighborhood collective communication operations to achieve fine-grained computation/communication overlap. We also leverage the CORE-Direct network offload feature in theConnectX-2 InfiniBand adapter from Mellanox to design highly efficient and scalable non-blocking, neighborhood Alltoallv and Allgatherv collective operations. Our experimental evaluations show that we can improve the communication overheads of the2D BFS algorithm by up to 70%, with 1,936 processes.
Keywords :
application program interfaces; distributed memory systems; graph theory; libraries; message passing; parallel processing; tree searching; 2D BFS algorithm; 2D breadth first search implementation; BFS implementation; CORE-direct network offload feature; CombBLAS library; ConnectX-2 InfiniBand adapter; MPI Allgatherv; MPI Alltoallv; Mellanox; big data applications; communication overhead minimization; distributed memory systems; double-buffering; graph-based computations; graph-theoretic analysis; intensive computing domains; irregular graph algorithms; latency hiding; message passing interface; neighborhood collective communication operations; network-offload based nonblocking neighborhood MPI collectives; parallel applications; Algorithm design and analysis; Benchmark testing; Clustering algorithms; Libraries; Sorting; Topology; Vectors; 2D BFS Algorithms; Collective Offload and InfiniBand; MPI-3 non-blocking collectives;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing Workshops (CLUSTER WORKSHOPS), 2012 IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4673-2893-7
Type :
conf
DOI :
10.1109/ClusterW.2012.40
Filename :
6355868
Link To Document :
بازگشت