مرکز منطقه ای اطلاع رساني علوم و فناوري - Efficient Offloading of Parallel Kernels Using MPI_Comm

DocumentCode :

656224

Title :

Efficient Offloading of Parallel Kernels Using MPI_Comm_Spawn

Author :

Rinke, Sebastian ; Prabhakaran, Suraj ; Wolf, Felix

Author_Institution :

German Res. Sch. for Simulation Sci., RWTH Aachen Univ., Aachen, Germany

fYear :

2013

fDate :

1-4 Oct. 2013

Firstpage :

877

Lastpage :

884

Abstract :

The integration of accelerators into cluster systems is currently one of the architectural trends in high performance computing. Usually, those accelerators are many core compute devices which are directly connected to individual cluster nodes via PCI Express. Recent advances of accelerators, however, do not require a host CPU anymore and now even enable their integration as self-contained nodes that are able to MPI-communicate over their own network interface. This approach offers new opportunities for application developers, as compute kernels can now span multiple communicating accelerators to better account for larger MPI-based code regions with the potential for massive node-level parallelism. However, it also raises the question of how to program such an environment. An instance of this novel cluster architecture is the DEEP cluster system currently under development. Based on this hardware concept, we investigate the MPI_Comm_spawn process creation mechanism for offloading MPI-based distributed memory compute kernels onto multiple network-attached accelerators. We identify limitations of MPI_Comm_spawn and present an offloading mechanism which results in only a fraction of the overhead of a pure MPI_Comm_spawn solution.

Keywords :

application program interfaces; distributed memory systems; message passing; parallel processing; peripheral interfaces; DEEP cluster system; MPI Comm spawn process creation mechanism; MPI-based code regions; MPI-based distributed memory compute kernels; PCI express; Parallel Kernels Offloading; accelerators integration; cluster architecture; cluster systems; network interface; node-level parallelism; Computational modeling; Graphics processing units; Hardware; Kernel; Programming; Runtime; DEEP; Intel Xeon Phi; MPI_Comm_spawn; computation offloading; network-attached accelerators;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel Processing (ICPP), 2013 42nd International Conference on

Conference_Location :

Lyon

ISSN :

0190-3918

Type :

conf

DOI :

10.1109/ICPP.2013.104

Filename :

6687428

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=656224