Title :
Coordinating GPU Threads for OpenMP 4.0 in LLVM
Author :
Bertolli, Carlo ; Antao, Samuel F. ; Eichenberger, Alexandre E. ; Sura, Kevin O´Brien Zehra ; Jacob, Arpith C. ; Tong Chen ; Sallenave, Olivier
Author_Institution :
IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
Abstract :
GPUs devices are becoming critical building blocks of High-Performance platforms for performance and energy efficiency reasons. As a consequence, parallel programming environment such as OpenMP were extended to support offloading code to such devices. OpenMP compilers are faced with offering an efficient implementation of device-targeting constructs. One main issue in implementing OpenMP on a GPU is related to efficiently supporting sequential and parallel regions, as GPUs are only optimized to execute highly parallel workloads. Multiple solutions to this issue were proposed in previous research. In this paper, we propose a method to coordinate threads in an NVIDIA GPU that is both efficient and easily integrated as part of a compiler. To support our claims, we developed CUDA programs that mimic multiple coordination schemes and we compare their performances. We show that a scheme based on dynamic parallelism performs poorly compared to inspector-executor schemes that we introduce in this paper. We also discuss how to integrate these schemes to the LLVM compiler infrastructure.
Keywords :
graphics processing units; multi-threading; parallel architectures; program compilers; CUDA programs; GPU devices; GPU threads; LLVM compiler infrastructure; NVIDIA GPU; OpenMP 4.0; OpenMP compilers; code offloading; dynamic parallelism; graphics processing unit; high-performance platforms; inspector-executor schemes; parallel programming environment; parallel regions; parallel workloads; sequential regions; Acceleration; Graphics processing units; Kernel; Parallel processing; Performance evaluation; Synchronization;
Conference_Titel :
LLVM Compiler Infrastructure in HPC (LLVM-HPC), 2014
DOI :
10.1109/LLVM-HPC.2014.10