Title :
CU2CL: A CUDA-to-OpenCL Translator for Multi- and Many-Core Architectures
Author :
Martinez, Gabriel ; Gardner, Mark ; Feng, Wu-chun
Author_Institution :
Dept. of Comput. Sci., Virginia Tech, Blacksburg, VA, USA
Abstract :
The use of graphics processing units (GPUs) in high-performance parallel computing continues to become more prevalent, often as part of a heterogeneous system. For years, CUDA has been the de facto programming environment for nearly all general-purpose GPU (GPGPU) applications. In spite of this, the framework is available only on NVIDIA GPUs, traditionally requiring reimplementation in other frameworks in order to utilize additional multi- or many-core devices. On the other hand, OpenCL provides an open and vendor-neutral programming environment and runtime system. With implementations available for CPUs, GPUs, and other types of accelerators, OpenCL therefore holds the promise of a "write once, run anywhere" ecosystem for heterogeneous computing. Given the many similarities between CUDA and OpenCL, manually porting a CUDA application to OpenCL is typically straightforward, albeit tedious and error-prone. In response to this issue, we created CU2CL, an automated CUDA-to-OpenCL source-to-source translator that possesses a novel design and clever reuse of the Clang compiler framework. Currently, the CU2CL translator covers the primary constructs found in CUDA runtime API, and we have successfully translated many applications from the CUDA SDK and Rodinia benchmark suite. The performance of the automatically translated applications via CU2CL is on par with their manually ported counterparts.
Keywords :
application program interfaces; coprocessors; multiprocessing systems; parallel architectures; program compilers; program interpreters; programming environments; API; CPU; CU2CL; CUDA SDK; Clang compiler framework reusing; NVIDIA GPU; Rodinia benchmark suite; automated CUDA-to-OpenCL source-to-source translator; de facto programming environment; general-purpose GPU application; graphics processing units; heterogeneous computing; heterogeneous system; many-core architecture; multicore architecture; parallel computing; runtime system; vendor-neutral programming environment; Graphics processing unit; Instruction sets; Kernel; Libraries; Memory management; Programming; Runtime; CUDA; Clang; OpenCL; abstract syntax tree; compiler; source-to-source translation;
Conference_Titel :
Parallel and Distributed Systems (ICPADS), 2011 IEEE 17th International Conference on
Conference_Location :
Tainan
Print_ISBN :
978-1-4577-1875-5
DOI :
10.1109/ICPADS.2011.48