• DocumentCode
    154128
  • Title

    HAND: A Hybrid Approach to Accelerate Non-contiguous Data Movement Using MPI Datatypes on GPU Clusters

  • Author

    Rong Shi ; Xiaoyi Lu ; Potluri, Sreeram ; Hamidouche, Khaled ; Jie Zhang ; Panda, Dhabaleswar K.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
  • fYear
    2014
  • fDate
    9-12 Sept. 2014
  • Firstpage
    221
  • Lastpage
    230
  • Abstract
    An increasing number of MPI applications are being ported to take advantage of the compute power offered by GPUs. Data movement continues to be the major bottleneck on GPU clusters, more so when data is non-contiguous, which is common in scientific applications. The existing techniques of optimizing MPI data type processing, to improve performance of non-contiguous data movement, handle only certain data patterns efficiently while incurring overheads for the others. In this paper, we first propose a set of optimized techniques to handle different MPI data types. Next, we propose a novel framework (HAND) that enables hybrid and adaptive selection among different techniques and tuning to achieve better performance with all data types. Our experimental results using the modified DDTBench suite demonstrate up to a 98% reduction in data type latency. We also apply this data type-aware design on an N-Body particle simulation application. Performance evaluation of this application on a 64 GPU cluster shows that our proposed approach can achieve up to 80% and 54% increase in performance by using struct and indexed data types compared to the existing best design. To the best of our knowledge, this is the first attempt to propose a hybrid and adaptive solution to integrate all existing schemes to optimize arbitrary non-contiguous data movement using MPI data types on GPU clusters.
  • Keywords
    application program interfaces; graphics processing units; message passing; GPU clusters; HAND framework; MPI applications; MPI datatype handling; N-body particle simulation application; arbitrary noncontiguous data movement optimization; data patterns; datatype latency reduction; datatype-aware design; hybrid adaptive selection; hybrid approach-to-accelerate noncontiguous data movement; indexed datatypes; modified DDTBench suite; optimized techniques; performance improvement; struct datatypes; Arrays; Graphics processing units; Kernel; Shape; Three-dimensional displays; Tuning; Vectors; CUDA; Datatype; GPU; MPI;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing (ICPP), 2014 43rd International Conference on
  • Conference_Location
    Minneapolis MN
  • ISSN
    0190-3918
  • Type

    conf

  • DOI
    10.1109/ICPP.2014.31
  • Filename
    6957231