DocumentCode :
625655
Title :
Kernel Specialization for Improved Adaptability and Performance on Graphics Processing Units (GPUs)
Author :
Moore, Neil ; Leeser, Miriam ; Smith King, Laurie
Author_Institution :
MathWorks, Natick, MA, USA
fYear :
2013
fDate :
20-24 May 2013
Firstpage :
1037
Lastpage :
1048
Abstract :
Graphics processing units (GPUs) offer significant speedups over CPUs for certain classes of applications. However, programming for GPUs is challenging. There are many parameters that affect performance and their values may change depending on both problem instance and GPU hardware specifics. In addition, most GPU kernels are compiled once; performance optimizations are applied at application compile time. As a result, many GPU libraries and programs have limited adaptability to variations among problem instances and hardware configurations. These factors limit code reuse and the applicability of GPU computing to a wider variety of problems. This paper introduces GPGPU kernel specialization, a technique used to describe highly adaptable kernels that exhibit high performance across a wide range of programmer variables as well as different generations of GPUs. We also introduce our GPU Prototyping Framework (GPU-PF) for dynamic runtime generation of customized GPU kernels incorporating both problem and implementation-specific parameters. GPU-PF fully separates the GPU and CPU code so the GPU code can be compiled during program execution once all the parameters are known. This work explores the implementation and parameterization of two real world applications targeting two generations of NVIDIA CUDA-enabled GPUs using kernel specialization and GPU-PF: large template matching and cone-beam image reconstruction via backprojection. Starting with high performance GPU kernels that compare favorably to multi-threaded reference implementations, kernel specialization is shown to increase adaptability while providing performance improvements including improved run time and reduction in resource usage. Kernel specialization offers productivity benefits, improved library code, and a means to increase the parameterizability of GPGPU implementations.
Keywords :
graphics processing units; multi-threading; parallel architectures; program compilers; resource allocation; software libraries; software reusability; CPU code; GPGPU kernel specialization; GPU code; GPU computing; GPU hardware specifics; GPU libraries; GPU prototyping framework; GPU-PF; NVIDIA CUDA-enabled GPU; application compile time; backprojection; code reuse; cone-beam image reconstruction; customized GPU kernel; dynamic runtime generation; graphics processing unit; hardware configuration; high performance GPU kernel; library code; multithreaded reference implementation; parameterizability; parameterization; performance optimization; productivity benefit; program execution; programmer variable; programming; resource usage reduction; speedup; template matching; Graphics processing units; Hardware; Kernel; Libraries; Optimization; Registers; Runtime; GPU; compilation; performance; template matching;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on
Conference_Location :
Boston, MA
ISSN :
1530-2075
Print_ISBN :
978-1-4673-6066-1
Type :
conf
DOI :
10.1109/IPDPS.2013.31
Filename :
6569883
Link To Document :
بازگشت