مرکز منطقه ای اطلاع رساني علوم و فناوري - Evaluation of Inter- and Intra-node Data Transfer Efficiencies between GPU Devices and their Impact on Scalable Applications

DocumentCode :

611020

Title :

Evaluation of Inter- and Intra-node Data Transfer Efficiencies between GPU Devices and their Impact on Scalable Applications

Author :

Pena, A.J. ; Alam, S.R.

Author_Institution :

Dept. of Comput. Sci. & Eng., Univ. Jaume I, Castellon de la Plana, Spain

fYear :

2013

fDate :

13-16 May 2013

Firstpage :

144

Lastpage :

151

Abstract :

Data movement is of high relevance for GPU Computing. Communication and performance efficiencies of applications and systems with GPU accelerators depend on on- and off-node data paths, thereby making tuning and optimization an increasingly complex task. In this paper we conduct an in-depth study to establish the parameters that influence performance of data transfers between on-node GPU devices, and located on separate nodes (off-node). We compare the most recent version of MVAPICH2 featuring seamless remote GPU transfers with our own low-level benchmarks, and discuss the bottlenecks that may arise. Data path performance and bottlenecks between GPU devices are analyzed and compared for two substantially different systems: an IBM datable relying on an InfiniBand QDR fabric with two on-node GPU devices, and a Cray XK6, featuring a single GPU per node, and connected through a Gemini interconnect. Finally, we adapt LAMMPS, a GPU-accelerated application, to benefit from efficient inter-GPU data transfers, and validate our findings.

Keywords :

benchmark testing; data handling; electronic data interchange; graphics processing units; parallel processing; Cray XK6; GPU accelerators; GPU computing; GPU-accelerated application; Gemini interconnection; IBM iDataPlex; InfiniBand QDR fabric; LAMMPS; MVAPICH2; data movement; data path performance; interGPU data transfers; internode data transfer efficiencies; intranode data transfer efficiencies; low-level benchmarks; off-node data paths; on-node GPU devices; remote GPU transfers; Benchmark testing; Data transfer; Graphics processing units; Libraries; Memory management; Pipelines; Throughput; GPU computing; cluster computing; high performance computing; performance evaluation;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Cluster, Cloud and Grid Computing (CCGrid), 2013 13th IEEE/ACM International Symposium on

Conference_Location :

Delft

Print_ISBN :

978-1-4673-6465-2

Type :

conf

DOI :

10.1109/CCGrid.2013.15

Filename :

6546072

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=611020