DocumentCode :
656224
Title :
Efficient Offloading of Parallel Kernels Using MPI_Comm_Spawn
Author :
Rinke, Sebastian ; Prabhakaran, Suraj ; Wolf, Felix
Author_Institution :
German Res. Sch. for Simulation Sci., RWTH Aachen Univ., Aachen, Germany
fYear :
2013
fDate :
1-4 Oct. 2013
Firstpage :
877
Lastpage :
884
Abstract :
The integration of accelerators into cluster systems is currently one of the architectural trends in high performance computing. Usually, those accelerators are many core compute devices which are directly connected to individual cluster nodes via PCI Express. Recent advances of accelerators, however, do not require a host CPU anymore and now even enable their integration as self-contained nodes that are able to MPI-communicate over their own network interface. This approach offers new opportunities for application developers, as compute kernels can now span multiple communicating accelerators to better account for larger MPI-based code regions with the potential for massive node-level parallelism. However, it also raises the question of how to program such an environment. An instance of this novel cluster architecture is the DEEP cluster system currently under development. Based on this hardware concept, we investigate the MPI_Comm_spawn process creation mechanism for offloading MPI-based distributed memory compute kernels onto multiple network-attached accelerators. We identify limitations of MPI_Comm_spawn and present an offloading mechanism which results in only a fraction of the overhead of a pure MPI_Comm_spawn solution.
Keywords :
application program interfaces; distributed memory systems; message passing; parallel processing; peripheral interfaces; DEEP cluster system; MPI Comm spawn process creation mechanism; MPI-based code regions; MPI-based distributed memory compute kernels; PCI express; Parallel Kernels Offloading; accelerators integration; cluster architecture; cluster systems; network interface; node-level parallelism; Computational modeling; Graphics processing units; Hardware; Kernel; Programming; Runtime; DEEP; Intel Xeon Phi; MPI_Comm_spawn; computation offloading; network-attached accelerators;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing (ICPP), 2013 42nd International Conference on
Conference_Location :
Lyon
ISSN :
0190-3918
Type :
conf
DOI :
10.1109/ICPP.2013.104
Filename :
6687428
Link To Document :
بازگشت