Title :
A Compiler Translate Directive-Based Language to Optimized CUDA
Author :
Feng Li ; Hong An ; Weihao Liang ; Xiaoqiang Li ; Yichao Cheng ; Xia Jiang
Author_Institution :
Sch. of Comput. Sci. & Technol., Univ. of Sci. & Technol. of China, Hefei, China
Abstract :
Graphics processing units(GPUs) provide a low cost platform for accelerating high performance computations. New programming languages, such as CUDA and OpenCL, make GPU programming attractive to programmers. However, programming GPUs is still a cumbersome task for two reasons, tedious performance optimizations and lack of portability. First, optimizing an algorithm for a specific GPU is a time-consuming task that requires a thorough understanding of both the algorithm and the underlying hardware. Unoptimized CUDA programs typically only achieve a small fraction of the peak GPU performance. Second, CUDA programs lack performance portability between different GPUs. Moving code from one GPU to another while maintaining the desired performance is a non-trivial task which often requires significant time. In this paper, we propose an optimized compiler that compiles a representative high level directive-based language to CUDA, which is capable of performing a wide variety of optimizations to generate efficient code for GPUs. We alleviate the portability problem of current GPU programming methods by using a high level directive-based language that provides a unified abstraction for currently popular CPU-GPU heterogeneous systems. Various optimizations, mainly the memory system optimizations, are automatically applied by our compiler to produce optimized CUDA code for GPU. Experiments on rodinia benchmark with different input sizes shows that our compiler achieves 70%, 75%, 84% performance of hand-written code on average respectively.
Keywords :
graphics processing units; optimising compilers; parallel architectures; parallel languages; parallel programming; CUDA code; CUDA programs; GPU performance; GPU programming; GPU programming methods; OpenCL; graphics processing units; hand-written code; high level directive-based language; high performance computations; memory system optimizations; optimized compiler; performance portability problem; programming languages; rodinia benchmark; Arrays; Graphics processing units; Hardware; Instruction sets; Kernel; Optimization; Parallel processing; Compiler; Directive-based language; GPU; Performance Optimization; Portability;
Conference_Titel :
High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), 2014 IEEE Intl Conf on
Print_ISBN :
978-1-4799-6122-1
DOI :
10.1109/HPCC.2014.162