DocumentCode
8104
Title
Processing MPI Derived Datatypes on Noncontiguous GPU-Resident Data
Author
Jenkins, J. ; Dinan, James ; Balaji, Pavan ; Peterka, Tom ; Samatova, N.F. ; Thakur, Rahul
Author_Institution
Dept. of Comput. Sci., North Carolina State Univ., Raleigh, NC, USA
Volume
25
Issue
10
fYear
2014
fDate
Oct. 2014
Firstpage
2627
Lastpage
2637
Abstract
Driven by the goals of efficient and generic communication of noncontiguous data layouts in GPU memory, for which solutions do not currently exist, we present a parallel, noncontiguous data-processing methodology through the MPI datatypes specification. Our processing algorithm utilizes a kernel on the GPU to pack arbitrary noncontiguous GPU data by enriching the datatypes encoding to expose a fine-grained, data-point level of parallelism. Additionally, the typically tree-based datatype encoding is preprocessed to enable efficient, cached access across GPU threads. Using CUDA, we show that the computational method outperforms DMA-based alternatives for several common data layouts as well as more complex data layouts for which reasonable DMA-based processing does not exist. Our method incurs low overhead for data layouts that closely match best-case DMA usage or that can be processed by layout-specific implementations. We additionally investigate usage scenarios for data packing that incur resource contention, identifying potential pitfalls for various packing strategies. We also demonstrate the efficacy of kernel-based packing in various communication scenarios, showing multifold improvement in point-to-point communication and evaluating packing within the context of the SHOC stencil benchmark and HACC mesh analysis.
Keywords
application program interfaces; data handling; graphics processing units; message passing; parallel architectures; CUDA; DMA-based processing; GPU memory; GPU threads; HACC mesh analysis; MPI derived datatypes processing; SHOC stencil benchmark; compute unified device architecture; fine-grained data-point parallelism level; graphics processing unit; kernel-based packing strategies; message passing interface; noncontiguous GPU-resident data; noncontiguous data layouts; parallel noncontiguous data-processing methodology; tree-based datatype encoding; Computer graphics; Data models; Graphics processing units; CUDA; MPI; datatype; graphics processing unit;
fLanguage
English
Journal_Title
Parallel and Distributed Systems, IEEE Transactions on
Publisher
ieee
ISSN
1045-9219
Type
jour
DOI
10.1109/TPDS.2013.234
Filename
6600679
Link To Document