DocumentCode :
2534369
Title :
High Performance Design and Implementation of Nemesis Communication Layer for Two-Sided and One-Sided MPI Semantics in MVAPICH2
Author :
Luo, Miao ; Potluri, Sreeram ; Lai, Ping ; Mancini, Emilio P. ; Subramoni, Hari ; Kandalla, Krishna ; Sur, Sayantan ; Panda, Dhabaleswar K.
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
fYear :
2010
fDate :
13-16 Sept. 2010
Firstpage :
377
Lastpage :
386
Abstract :
High End Computing (HEC) systems are being deployed with eight to sixteen compute cores, with 64 to 128 cores/node being envisioned for exascale systems. MVAPICH2 is a popular implementation of MPI-2 specifically designed and optimized for InfiniBand, iWARP and RDMA over Converged Ethernet (RoCE). MVAPICH2 is based on MPICH2 from ANL. Recently MPICH2 has been redesigned with an effort to optimize intra-node communication for future many-core systems. The new communication layer in MPICH2 is called Nemesis, which is very well optimized for shared memory message passing, with a modular design for various high-performance interconnects. In this paper we explore the challenges involved in designing the next-generation MVAPICH2 stack, leveraging the Nemesis communication layer. We observe that Nemesis does not provide abstractions for one-sided communication. We propose an extended Nemesis interface for optimized one-sided communication and provide design details. Our experimental evaluation shows that our proposed one-sided interface extensions are able to provide significantly better performance than the basic Nemesis interface. For example, inter-node MPI_Put bandwidth increased from 1,800 MB/s to 3,000 MB/s and latency for small messages went down by 13%. Additionally, with our proposed designs, we are able to demonstrate performance gains with small messages, when compared to the existing MVAPICH2 CH3 implementation. The designs proposed in this paper is a superset of currently available options to MVAPICH2 users and provides the best combination of performance and modularity.
Keywords :
application program interfaces; local area networks; message passing; HEC systems; InfiniBand; RDMA; converged Ethernet; exascale systems; high end computing system; high performance design; high-performance interconnects; iWARP; intranode communication; many-core systems; nemesis communication layer; next-generation MVAPICH2 stack; one-sided MPI semantics; optimized one-sided communication; shared memory message passing; two-sided MPI semantics; Ethernet networks; Hardware; Open source software; Optimization; Semantics; Sockets; Synchronization; MPICH2; MVAPICH2; RMA;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing Workshops (ICPPW), 2010 39th International Conference on
Conference_Location :
San Diego, CA
ISSN :
1530-2016
Print_ISBN :
978-1-4244-7918-4
Electronic_ISBN :
1530-2016
Type :
conf
DOI :
10.1109/ICPPW.2010.58
Filename :
5599096
Link To Document :
بازگشت