مرکز منطقه ای اطلاع رساني علوم و فناوري - Optimal loop unrolling for GPGPU programs

DocumentCode :

2440870

Title :

Optimal loop unrolling for GPGPU programs

Author :

Murthy, Giridhar Sreenivasa ; Ravishankar, Mahesh ; Baskaran, Muthu Manikandan ; Sadayappan, P.

Author_Institution :

Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA

fYear :

2010

fDate :

19-23 April 2010

Firstpage :

Lastpage :

Abstract :

Graphics Processing Units (GPUs) are massively parallel, many-core processors with tremendous computational power and very high memory bandwidth. With the advent of general purpose programming models such as NVIDIA´s CUDA and the new standard OpenCL, general purpose programming using GPUs (GPGPU) has become very popular. However, the GPU architecture and programming model have brought along with it many new challenges and opportunities for compiler optimizations. One such classical optimization is loop unrolling. Current GPU compilers perform limited loop unrolling. In this paper, we attempt to understand the impact of loop unrolling on GPGPU programs. We develop a semi-automatic, compile-time approach for identifying optimal unroll factors for suitable loops in GPGPU programs. In addition, we propose techniques for reducing the number of unroll factors evaluated, based on the characteristics of the program being compiled and the device being compiled to. We use these techniques to evaluate the effect of loop unrolling on a range of GPGPU programs and show that we correctly identify the optimal unroll factors. The optimized versions run up to 70 percent faster than the unoptimized versions.

Keywords :

computer graphic equipment; coprocessors; parallel programming; program compilers; GPGPU programs; GPU architecture; GPU compilers; NVIDIA CUDA; OpenCL; compiler optimization; general purpose programming; graphics processing units; massively parallel many-core processors; optimal loop unrolling; programming model; Central Processing Unit; Computer graphics; Computer science; Concurrent computing; Linear programming; Optimizing compilers; Power engineering and energy; Power engineering computing; Program processors; Registers; Compiler optimizations; GPGPU; Loop Unrolling;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on

Conference_Location :

Atlanta, GA

ISSN :

1530-2075

Print_ISBN :

978-1-4244-6442-5

Type :

conf

DOI :

10.1109/IPDPS.2010.5470423

Filename :

5470423

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2440870