DocumentCode
154128
Title
HAND: A Hybrid Approach to Accelerate Non-contiguous Data Movement Using MPI Datatypes on GPU Clusters
Author
Rong Shi ; Xiaoyi Lu ; Potluri, Sreeram ; Hamidouche, Khaled ; Jie Zhang ; Panda, Dhabaleswar K.
Author_Institution
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
fYear
2014
fDate
9-12 Sept. 2014
Firstpage
221
Lastpage
230
Abstract
An increasing number of MPI applications are being ported to take advantage of the compute power offered by GPUs. Data movement continues to be the major bottleneck on GPU clusters, more so when data is non-contiguous, which is common in scientific applications. The existing techniques of optimizing MPI data type processing, to improve performance of non-contiguous data movement, handle only certain data patterns efficiently while incurring overheads for the others. In this paper, we first propose a set of optimized techniques to handle different MPI data types. Next, we propose a novel framework (HAND) that enables hybrid and adaptive selection among different techniques and tuning to achieve better performance with all data types. Our experimental results using the modified DDTBench suite demonstrate up to a 98% reduction in data type latency. We also apply this data type-aware design on an N-Body particle simulation application. Performance evaluation of this application on a 64 GPU cluster shows that our proposed approach can achieve up to 80% and 54% increase in performance by using struct and indexed data types compared to the existing best design. To the best of our knowledge, this is the first attempt to propose a hybrid and adaptive solution to integrate all existing schemes to optimize arbitrary non-contiguous data movement using MPI data types on GPU clusters.
Keywords
application program interfaces; graphics processing units; message passing; GPU clusters; HAND framework; MPI applications; MPI datatype handling; N-body particle simulation application; arbitrary noncontiguous data movement optimization; data patterns; datatype latency reduction; datatype-aware design; hybrid adaptive selection; hybrid approach-to-accelerate noncontiguous data movement; indexed datatypes; modified DDTBench suite; optimized techniques; performance improvement; struct datatypes; Arrays; Graphics processing units; Kernel; Shape; Three-dimensional displays; Tuning; Vectors; CUDA; Datatype; GPU; MPI;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Processing (ICPP), 2014 43rd International Conference on
Conference_Location
Minneapolis MN
ISSN
0190-3918
Type
conf
DOI
10.1109/ICPP.2014.31
Filename
6957231
Link To Document