• DocumentCode
    8104
  • Title

    Processing MPI Derived Datatypes on Noncontiguous GPU-Resident Data

  • Author

    Jenkins, J. ; Dinan, James ; Balaji, Pavan ; Peterka, Tom ; Samatova, N.F. ; Thakur, Rahul

  • Author_Institution
    Dept. of Comput. Sci., North Carolina State Univ., Raleigh, NC, USA
  • Volume
    25
  • Issue
    10
  • fYear
    2014
  • fDate
    Oct. 2014
  • Firstpage
    2627
  • Lastpage
    2637
  • Abstract
    Driven by the goals of efficient and generic communication of noncontiguous data layouts in GPU memory, for which solutions do not currently exist, we present a parallel, noncontiguous data-processing methodology through the MPI datatypes specification. Our processing algorithm utilizes a kernel on the GPU to pack arbitrary noncontiguous GPU data by enriching the datatypes encoding to expose a fine-grained, data-point level of parallelism. Additionally, the typically tree-based datatype encoding is preprocessed to enable efficient, cached access across GPU threads. Using CUDA, we show that the computational method outperforms DMA-based alternatives for several common data layouts as well as more complex data layouts for which reasonable DMA-based processing does not exist. Our method incurs low overhead for data layouts that closely match best-case DMA usage or that can be processed by layout-specific implementations. We additionally investigate usage scenarios for data packing that incur resource contention, identifying potential pitfalls for various packing strategies. We also demonstrate the efficacy of kernel-based packing in various communication scenarios, showing multifold improvement in point-to-point communication and evaluating packing within the context of the SHOC stencil benchmark and HACC mesh analysis.
  • Keywords
    application program interfaces; data handling; graphics processing units; message passing; parallel architectures; CUDA; DMA-based processing; GPU memory; GPU threads; HACC mesh analysis; MPI derived datatypes processing; SHOC stencil benchmark; compute unified device architecture; fine-grained data-point parallelism level; graphics processing unit; kernel-based packing strategies; message passing interface; noncontiguous GPU-resident data; noncontiguous data layouts; parallel noncontiguous data-processing methodology; tree-based datatype encoding; Computer graphics; Data models; Graphics processing units; CUDA; MPI; datatype; graphics processing unit;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2013.234
  • Filename
    6600679