DocumentCode :
154128
Title :
HAND: A Hybrid Approach to Accelerate Non-contiguous Data Movement Using MPI Datatypes on GPU Clusters
Author :
Rong Shi ; Xiaoyi Lu ; Potluri, Sreeram ; Hamidouche, Khaled ; Jie Zhang ; Panda, Dhabaleswar K.
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
fYear :
2014
fDate :
9-12 Sept. 2014
Firstpage :
221
Lastpage :
230
Abstract :
An increasing number of MPI applications are being ported to take advantage of the compute power offered by GPUs. Data movement continues to be the major bottleneck on GPU clusters, more so when data is non-contiguous, which is common in scientific applications. The existing techniques of optimizing MPI data type processing, to improve performance of non-contiguous data movement, handle only certain data patterns efficiently while incurring overheads for the others. In this paper, we first propose a set of optimized techniques to handle different MPI data types. Next, we propose a novel framework (HAND) that enables hybrid and adaptive selection among different techniques and tuning to achieve better performance with all data types. Our experimental results using the modified DDTBench suite demonstrate up to a 98% reduction in data type latency. We also apply this data type-aware design on an N-Body particle simulation application. Performance evaluation of this application on a 64 GPU cluster shows that our proposed approach can achieve up to 80% and 54% increase in performance by using struct and indexed data types compared to the existing best design. To the best of our knowledge, this is the first attempt to propose a hybrid and adaptive solution to integrate all existing schemes to optimize arbitrary non-contiguous data movement using MPI data types on GPU clusters.
Keywords :
application program interfaces; graphics processing units; message passing; GPU clusters; HAND framework; MPI applications; MPI datatype handling; N-body particle simulation application; arbitrary noncontiguous data movement optimization; data patterns; datatype latency reduction; datatype-aware design; hybrid adaptive selection; hybrid approach-to-accelerate noncontiguous data movement; indexed datatypes; modified DDTBench suite; optimized techniques; performance improvement; struct datatypes; Arrays; Graphics processing units; Kernel; Shape; Three-dimensional displays; Tuning; Vectors; CUDA; Datatype; GPU; MPI;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing (ICPP), 2014 43rd International Conference on
Conference_Location :
Minneapolis MN
ISSN :
0190-3918
Type :
conf
DOI :
10.1109/ICPP.2014.31
Filename :
6957231
Link To Document :
بازگشت