Title :
Zero-copy protocol for MPI using infiniband unreliable datagram
Author :
Koop, Matthew J. ; Sur, Sayantan ; Panda, Dhabaleswar K.
Author_Institution :
Network-Based Comput. Lab., Ohio State Univ., Columbus, OH
Abstract :
Memory copies are widely regarded as detrimental to the overall performance of applications. High-performance systems make every effort to reduce the number of memory copies, especially the copies incurred during message passing. State of the art implementations of message-passing libraries, such as MPI, utilize user-level networking protocols to reduce or eliminate memory copies. InfiniBand is an emerging user-level networking technology that is gaining rapid acceptance in several domains, including HPC. In order to eliminate message copies while transferring large messages, MPI libraries over InfiniBand employ ldquozero-copyrdquo protocols which use remote direct memory access (RDMA). RDMA is available only in the connection-oriented transports of InfiniBand, such as reliable connection (RC). However, the unreliable datagram (UD) transport of InfiniBand has been shown to scale much better than the RC transport in regard to memory usage. In an optimal design, it should be possible to perform zero-copy message transfers over scalable transports (such as UD). In this paper, we present our design of a novel zero-copy protocol which is directly based over the scalable UD transport. Thus, our protocol achieves the twin objectives of scalability and good performance. Our analysis shows that uni-directional messaging bandwidth can be within 9% of what is achievable over RC for messages of 64 KB and above. Application benchmark evaluation shows that our design delivers a 21% speedup for the in.rhodo dataset for LAMMPS over a copy-based approach, giving performance within 1% of RC.
Keywords :
application program interfaces; memory protocols; message passing; transport protocols; InfiniBand unreliable datagram transport; InfiniBand user-level networking technology; MPI; connection-oriented transport; high-performance computing system; memory copy; message-passing library; remote direct memory access; zero-copy message transfer; zero-copy protocol; Access protocols; Bandwidth; Computer networks; Impedance; Laboratories; Large-scale systems; Libraries; Message passing; Scalability; Transport protocols;
Conference_Titel :
Cluster Computing, 2007 IEEE International Conference on
Conference_Location :
Austin, TX
Print_ISBN :
978-1-4244-1387-4
Electronic_ISBN :
1552-5244
DOI :
10.1109/CLUSTR.2007.4629230