DocumentCode :
2888001
Title :
Efficient offloading of collective communications in large-scale systems
Author :
Sancho, Jose Carlos ; Kerbyson, Darren J. ; Barker, Kevin J.
Author_Institution :
Performance & Archit. Lab. (PAL), Los Alamos Nat. Lab., Los Alamos, NM
fYear :
2007
fDate :
17-20 Sept. 2007
Firstpage :
169
Lastpage :
178
Abstract :
In parallel applications communication overheads generally increase as the processor count increases and in particular, collective communication operations can become a critical limiting factor in achieving high performance. In this paper we explore a novel technique to boost application performance by dedicating some processors in the system to collective operations. We demonstrate the viability and efficiency of this approach for the allreduce collective operation on a state-of-the-art cluster. Experimental results show that the collective latency can be reduced by 30% and that the communication overhead per processor is also very low, at 1.6 mus, which represents one order of magnitude higher performance than with conventional implementations. Moreover, results on a large-scale scientific application (POP) show that this approach achieves 15% higher performance on 640 processors than when using the default collective implementation.
Keywords :
parallel processing; collective communication offloading; high performance computing; large-scale system; parallel application; Acceleration; Computer architecture; Computer science; Coprocessors; Costs; Hardware; Laboratories; Large-scale systems; Network interfaces; Parallel programming;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing, 2007 IEEE International Conference on
Conference_Location :
Austin, TX
ISSN :
1552-5244
Print_ISBN :
978-1-4244-1387-4
Electronic_ISBN :
1552-5244
Type :
conf
DOI :
10.1109/CLUSTR.2007.4629229
Filename :
4629229
Link To Document :
بازگشت