DocumentCode
1984652
Title
Automatic Optimization of In-Flight Memory Transactions for GPU Accelerators Based on a Domain-Specific Language for Medical Imaging
Author
Membarth, Richard ; Hannig, Frank ; Teich, Jürgen ; Körner, Mario ; Eckert, Wieland
Author_Institution
Dept. of Comput. Sci., Univ. of Erlangen-Nuremberg, Erlangen-Nuremberg, Germany
fYear
2012
fDate
25-29 June 2012
Firstpage
211
Lastpage
218
Abstract
An efficient memory bandwidth utilization for GPU accelerators is crucial for memory bound applications. In medical imaging, the performance of many kernels is limited by the available memory bandwidth since only a few operations are performed per pixel. For such kernels only a fraction of the compute power provided by GPU accelerators can be exploited and performance is predetermined by memory bandwidth. As a remedy, this paper investigates the optimal utilization of available memory bandwidth by means of increasing in-flight memory transactions. Instead of doing this manually for different GPU accelerators, the required CUDA and OpenCL code is automatically generated from descriptions in a Domain-Specific Language (DSL) for the considered application domain. Moreover, the DSL is extended to also support global reduction operators. We show that the generated target-specific code improves bandwidth utilization for memory-bound kernels significantly. Moreover, competitive performance compared to the GPU back end of the widely used image processing library OpenCV can be achieved.
Keywords
graphics processing units; medical image processing; parallel architectures; storage management; CUDA; GPU accelerator; automatic optimization; domain-specific language; global reduction operator; in-flight memory transaction; medical imaging; memory bandwidth utilization; memory bound application; memory-bound kernel; optimal utilization; target-specific code; Bandwidth; DSL; Graphics processing unit; Image processing; Instruction sets; Kernel; Memory management; CUDA; GPU; OpenCL; code generation; domain-specific language; global operators; medical imaging; memory bandwidth utilization; reductions;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Computing (ISPDC), 2012 11th International Symposium on
Conference_Location
Munich/Garching, Bavaria
Print_ISBN
978-1-4673-2599-8
Type
conf
DOI
10.1109/ISPDC.2012.36
Filename
6341514
Link To Document