Title :
GGAS: Global GPU address spaces for efficient communication in heterogeneous clusters
Author :
Oden, Lena ; Froning, Holger
Author_Institution :
Fraunhofer Inst., Univ. of Heidelberg, Heidelberg, Germany
Abstract :
Modern GPUs are powerful high-core-count processors, which are no longer used solely for graphics applications, but are also employed to accelerate computationally intensive general-purpose tasks. For utmost performance, GPUs are distributed throughout the cluster to process parallel programs. In fact, many recent high-performance systems in the TOP500 list are heterogeneous architectures. Despite being highly effective processing units, GPUs on different hosts are incapable of communicating without assistance from a CPU. As a result, communication between distributed GPUs suffers from unnecessary overhead, introduced by switching control flow from GPUs to CPUs and vice versa. Most communication libraries even require intermediate copies from GPU memory to host memory. This overhead in particular penalizes small data movements and synchronization operations, reduces efficiency and limits scalability. In this work we introduce global address spaces to facilitate direct communication between distributed GPUs without CPU involvement. Avoiding context switches and unnecessary copying dramatically reduces communication overhead. We evaluate our approach using a variety of workloads including low-level latency and bandwidth benchmarks, basic synchronization primitives like barriers, and a stencil computation as an example application. We see performance benefits of up to 2× for basic benchmarks and up to 1.67× for stencil computations.
Keywords :
graphics processing units; parallel processing; storage allocation; storage management; synchronisation; CPU; GGAS; GPU memory; bandwidth benchmark; communication libraries; communication overhead reduction; computationally intensive general-purpose task acceleration; data movement; distributed GPU; global GPU address space; heterogeneous architecture; heterogeneous cluster communication; high-core-count processors; high-performance system; host memory; low-level latency; parallel program processing; stencil computation; switching control flow; synchronization operation; Acceleration; Benchmark testing; Graphics processing units; Indexes; Instruction sets; Kernel; Message systems; GPU communication; bulk-synchronous execution; hybrid computing clusters; parallel processing;
Conference_Titel :
Cluster Computing (CLUSTER), 2013 IEEE International Conference on
Conference_Location :
Indianapolis, IN
DOI :
10.1109/CLUSTER.2013.6702638