مرکز منطقه ای اطلاع رساني علوم و فناوري - HAND: A Hybrid Approach to Accelerate Non-contiguous Data Movement Using MPI Datatypes on GPU Clusters

DocumentCode :

154128

Title :

HAND: A Hybrid Approach to Accelerate Non-contiguous Data Movement Using MPI Datatypes on GPU Clusters

Author :

Rong Shi ; Xiaoyi Lu ; Potluri, Sreeram ; Hamidouche, Khaled ; Jie Zhang ; Panda, Dhabaleswar K.

Author_Institution :

Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA

fYear :

2014

fDate :

9-12 Sept. 2014

Firstpage :

221

Lastpage :

230

Abstract :

An increasing number of MPI applications are being ported to take advantage of the compute power offered by GPUs. Data movement continues to be the major bottleneck on GPU clusters, more so when data is non-contiguous, which is common in scientific applications. The existing techniques of optimizing MPI data type processing, to improve performance of non-contiguous data movement, handle only certain data patterns efficiently while incurring overheads for the others. In this paper, we first propose a set of optimized techniques to handle different MPI data types. Next, we propose a novel framework (HAND) that enables hybrid and adaptive selection among different techniques and tuning to achieve better performance with all data types. Our experimental results using the modified DDTBench suite demonstrate up to a 98% reduction in data type latency. We also apply this data type-aware design on an N-Body particle simulation application. Performance evaluation of this application on a 64 GPU cluster shows that our proposed approach can achieve up to 80% and 54% increase in performance by using struct and indexed data types compared to the existing best design. To the best of our knowledge, this is the first attempt to propose a hybrid and adaptive solution to integrate all existing schemes to optimize arbitrary non-contiguous data movement using MPI data types on GPU clusters.

Keywords :

application program interfaces; graphics processing units; message passing; GPU clusters; HAND framework; MPI applications; MPI datatype handling; N-body particle simulation application; arbitrary noncontiguous data movement optimization; data patterns; datatype latency reduction; datatype-aware design; hybrid adaptive selection; hybrid approach-to-accelerate noncontiguous data movement; indexed datatypes; modified DDTBench suite; optimized techniques; performance improvement; struct datatypes; Arrays; Graphics processing units; Kernel; Shape; Three-dimensional displays; Tuning; Vectors; CUDA; Datatype; GPU; MPI;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel Processing (ICPP), 2014 43rd International Conference on

Conference_Location :

Minneapolis MN

ISSN :

0190-3918

Type :

conf

DOI :

10.1109/ICPP.2014.31

Filename :

6957231

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=154128