Title :
Using GPU Shared Memory with a Directive-Based Approach
Author :
Wei Ding ; Ligang Lu ; Araya-Polo, Mauricio ; St-Cyr, Amik ; Hohl, Detlef ; Chapman, Barbara M.
Author_Institution :
Dept. of Comput. Sci., Univ. of Houston, Houston, TX, USA
Abstract :
Graphic Processing Units (GPUs) have been increasingly adopted by the High-Performance Computing community. Its unique hardware architecture supports hundreds or housands of light-weighted threads in a more power efficient manner compared with traditional CPUs, and with higher overall performance. This motivates highly parallel applications to be ported to GPUs. Programming GPUs is not a trivial task in particular for programmers familiar with X86-like architectures. CUDA and OpenCL are two low-level programming APIs which are designed to ease the GPU programming. Unfortunately, the resultant GPU codes greatly depart from traditional codes in both syntax and structure, making code hard to maintain. In order to keep the original code structure, directive-based programming models have been developed (OpenACC, HMPP, etc). In such programming models, the code is augmented with directives (as when using OpenMP) to guide the compiler to generate CUDA/OpenCL code automatically. To optimize performance, code restructuring is needed to make full and specific use of the GPU hardware advantages, e.g. GPU shared memory. In this paper, we explore various directive-based approaches to port a well-known Oil and Gas industry algorithm (Reverse Time Migration, or RTM) to GPUs while trying to balance code portability and performance maximization. Our HMPP implementation achieves 85% performance of the highly optimized version of CUDA result at the time of this work in the summer of 2013.
Keywords :
graphics processing units; parallel architectures; parallel programming; shared memory systems; CUDA; GPU shared memory; HMPP implementation; RTM; code portability; directive-based programming models; graphic processing units; hybrid multicore parallel programming; oil-and-gas industry algorithm; reverse time migration; Computer architecture; Graphics processing units; Imaging; Instruction sets; Kernel; Programming; Three-dimensional displays; CUDA; GPU; RTM; directive-based; shared memory;
Conference_Titel :
Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
Conference_Location :
Phoenix, AZ
Print_ISBN :
978-1-4799-4117-9
DOI :
10.1109/IPDPSW.2014.120