Title :
Automatic Optimization of Thread Mapping for a GPGPU Programming Framework
Author :
Ohno, Kazuhiko ; Kamiya, Tomoharu ; Maruyama, Takanori ; Matsumoto, Masaki
Author_Institution :
Dept. of Inf. Eng., Mie Univ., Mie, Japan
Abstract :
Although General Purpose computation on GPU (GPGPU) is widely used for high-performance computing, standard programming frameworks such as CUDA and Open CL are still difficult to use. They require low-level specifications and hand-optimization is a large burden. Therefore we are developing an easier framework named MESI-CUDA. Based on a virtual shared memory model, MESI-CUDA hides low level memory management and data transfer from the user. The compiler generates low-level code and also optimizes memory accesses applying conventional hand-optimizing techniques. However, creating GPU threads is same as CUDA, thread mapping, i.e. Thread indexing and the size of thread blocks run on each streaming multiprocessors (SM), are specified by the user. The mapping largely affects the execution performance and may obstruct automatic optimization of MESI-CUDA compiler. Therefore, the user must find optimal specification considering physical parameters. In this paper, we propose a new thread mapping scheme. We introduce new thread creation syntax specifying hardware-independent logical mapping, which is converted into optimized physical mapping at compile time. Making static analysis of array index expressions, we obtain groups of threads accessing the same or neighboring array elements. Mapping such threads into the same thread block and assigning consecutive thread indices, the physical mapping is determined to maximize the effect of memory access optimization. As the result of evaluation, our scheme could find optimal mapping strategy for three benchmark programs.
Keywords :
compiler generators; electronic data interchange; formal specification; graphics processing units; multi-threading; parallel architectures; program diagnostics; shared memory systems; storage management; GPGPU programming framework; GPU thread; MESI-CUDA compiler; Open CL; array index expression; automatic optimization; data transfer; general purpose computation; hand-optimization; hand-optimizing technique; hardware-independent logical mapping; high-performance computing; low level memory management; low-level code generation; low-level specification; memory access optimization; neighboring array element; optimal mapping strategy; optimal specification; optimized physical mapping; static analysis; streaming multiprocessor; thread block; thread creation syntax; thread indexing; thread indices; thread mapping scheme; virtual shared memory model; Arrays; Graphics processing units; Indexes; Instruction sets; Kernel; Optimization; Programming; CUDA; GPGPU; compiler; optimization; parallel programming;
Conference_Titel :
Computing and Networking (CANDAR), 2014 Second International Symposium on
DOI :
10.1109/CANDAR.2014.104