• DocumentCode
    1984652
  • Title

    Automatic Optimization of In-Flight Memory Transactions for GPU Accelerators Based on a Domain-Specific Language for Medical Imaging

  • Author

    Membarth, Richard ; Hannig, Frank ; Teich, Jürgen ; Körner, Mario ; Eckert, Wieland

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Erlangen-Nuremberg, Erlangen-Nuremberg, Germany
  • fYear
    2012
  • fDate
    25-29 June 2012
  • Firstpage
    211
  • Lastpage
    218
  • Abstract
    An efficient memory bandwidth utilization for GPU accelerators is crucial for memory bound applications. In medical imaging, the performance of many kernels is limited by the available memory bandwidth since only a few operations are performed per pixel. For such kernels only a fraction of the compute power provided by GPU accelerators can be exploited and performance is predetermined by memory bandwidth. As a remedy, this paper investigates the optimal utilization of available memory bandwidth by means of increasing in-flight memory transactions. Instead of doing this manually for different GPU accelerators, the required CUDA and OpenCL code is automatically generated from descriptions in a Domain-Specific Language (DSL) for the considered application domain. Moreover, the DSL is extended to also support global reduction operators. We show that the generated target-specific code improves bandwidth utilization for memory-bound kernels significantly. Moreover, competitive performance compared to the GPU back end of the widely used image processing library OpenCV can be achieved.
  • Keywords
    graphics processing units; medical image processing; parallel architectures; storage management; CUDA; GPU accelerator; automatic optimization; domain-specific language; global reduction operator; in-flight memory transaction; medical imaging; memory bandwidth utilization; memory bound application; memory-bound kernel; optimal utilization; target-specific code; Bandwidth; DSL; Graphics processing unit; Image processing; Instruction sets; Kernel; Memory management; CUDA; GPU; OpenCL; code generation; domain-specific language; global operators; medical imaging; memory bandwidth utilization; reductions;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Computing (ISPDC), 2012 11th International Symposium on
  • Conference_Location
    Munich/Garching, Bavaria
  • Print_ISBN
    978-1-4673-2599-8
  • Type

    conf

  • DOI
    10.1109/ISPDC.2012.36
  • Filename
    6341514