DocumentCode
2450102
Title
Overlapping computation and communication: Barrier algorithms and ConnectX-2 CORE-Direct capabilities
Author
Graham, Richard L. ; Poole, Steve ; Shamis, Pavel ; Bloch, Gil ; Bloch, Noam ; Chapman, Hillel ; Kagan, Michael ; Shahar, Ariel ; Rabinovitz, Ishai ; Shainer, Gilad
Author_Institution
Oak Ridge Nat. Lab. (ORNL), Oak Ridge, TN, USA
fYear
2010
fDate
19-23 April 2010
Firstpage
1
Lastpage
8
Abstract
This paper explores the computation and communication overlap capabilities enabled by the new CORE-Direct hardware capabilities introduced in the InfiniBand Network Interface Card (NIC) ConnectX-2. We use the latency dominated nonblocking barrier algorithm in this study, and find that at 64 process count, a contiguous time slot of about 80% of the nonblocking barrier time is available for computation. This time slot increases as the number of processes participating increases. In contrast, Central Processing Unit (CPU) based implementations provide a time slot of up to 30% of the nonblocking barrier time. This bodes well for the scalability of simulations employing offloaded collective operations. These capabilities can be used to reduce the effects of system noise, and when using non-blocking collective operations may also be used to hide the effects of application load imbalance.
Keywords
microprocessor chips; peripheral interfaces; ConnectX-2 CORE-direct capabilities; InfiniBand; barrier algorithms; load imbalance; network interface card; offloaded collective operations; Acoustical engineering; Central Processing Unit; Computational modeling; Computer interfaces; Computer networks; Delay; Hardware; Network interfaces; Noise reduction; Scalability; Barrier; CORE-Direct; InfiniBand; Offload;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on
Conference_Location
Atlanta, GA
Print_ISBN
978-1-4244-6533-0
Type
conf
DOI
10.1109/IPDPSW.2010.5470854
Filename
5470854
Link To Document