Title :
Efficient and scalable barrier over Quadrics and Myrinet with a new NIC-based collective message passing protocol
Author :
Yu, Weikuan ; Buntinas, Darius ; Graham, Rich L. ; Panda, Dhabaleswar K.
Author_Institution :
Dept. of Comput. & Info Sci., Ohio State Univ., Columbus, OH, USA
Abstract :
Summary form only given. Modern interconnects often have programmable processors in the network interface that can be utilized to offload communication processing from host CPU. We explore different schemes to support collective operations at the network interface and propose a new collective protocol. With barrier as an initial case study, we have demontrated that much of the communication processing can be greatly simplified with this collective protocol. Accordingly, we have designed and implemented efficient and scalable NIC-based barrier operations over two high performance interconnects, Quadrics and Myrinet. Our evaluation shows that, over a Quadrics cluster of 8 nodes with ELan3 network, the NIC-based barrier operation achieves a barrier latency of only 5.60μs. This result is a 2.48 factor of improvement over the Elanlib tree-based barrier operation. Over a Myrinet cluster of 8 nodes with LANai-XP NIC cards, a barrier latency of 14.20μs over 8 nodes is achieved. This is a 2.64 factor of improvement over the host-based barrier algorithm. Furthermore, an analytical model developed for the proposed scheme indicates that a NIC-based barrier operation on a 1024-node cluster can be performed with only 22.13μs latency over Quadrics and with 38.94μs latency over Myrinet. These results indicate the potential for developing high performance communication subsystems for next generation clusters.
Keywords :
message passing; multiprocessor interconnection networks; network interfaces; protocols; workstation clusters; ELan3 network; Myrinet; NIC-based barrier operation; Quadrics; collective message passing protocol; high performance communication subsystem; host CPU; host-based barrier algorithm; network interface; programmable processor; the Elanlib tree-based barrier operation; Broadcasting; Computer networks; Computer science; Delay; Hardware; Laboratories; Mathematics; Message passing; Network interfaces; Protocols;
Conference_Titel :
Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International
Print_ISBN :
0-7695-2132-0
DOI :
10.1109/IPDPS.2004.1303191