DocumentCode :
2397734
Title :
Cache Miss Analysis for GPU Programs Based on Stack Distance Profile
Author :
Tang, Tao ; Yang, Xuejun ; Lin, Yisong
Author_Institution :
Nat. Lab. for Parallel & Distrib. Process., Nat. Univ. of Defense Technol., Changsha, China
fYear :
2011
fDate :
20-24 June 2011
Firstpage :
623
Lastpage :
634
Abstract :
Using the graphics processing unit (GPU) to accelerate the general purpose computation has attracted much attention from both the academia and industry due to GPU´s powerful computing capacity. Thus optimization of GPU programs has become a popular research direction. In order to support the general purpose computing more efficiently, GPU has integrated the general data cache to replace the existing software-managed on-chip memory. Consequently, improving the usage of the data cache becomes of vital importance to improve the performance of the GPU programs. The foundation of cache locality optimizations is efficient analysis and prediction of the cache behavior. Unfortunately, existing cache miss analysis models are based on sequential programs and thus cannot be used to analyze the GPU programs directly. In this paper, based on the deep analysis of GPU´s execution model, we propose, for the first time, a cache miss analysis model for the GPU programs. We divide the problem into two subproblems: stack distance profile analysis of single thread block and cache contention analysis of multiple thread blocks. The experimental results from nine typical application kernels in the scientific computing field illustrate that our method is efficient and can be used to guide the cache locality optimizations for the GPU programs.
Keywords :
cache storage; computer graphic equipment; coprocessors; multi-threading; GPU program optimisation; cache contention analysis; cache locality optimizations; cache miss analysis models; data cache; general purpose computing; graphics processing unit; multiple thread blocks; software-managed on-chip memory; stack distance profile analysis; Analytical models; Computational modeling; Graphics processing unit; Instruction sets; Kernel; Optimization; Silicon; GPU; cache miss analysis model; stack distance profile;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Distributed Computing Systems (ICDCS), 2011 31st International Conference on
Conference_Location :
Minneapolis, MN
ISSN :
1063-6927
Print_ISBN :
978-1-61284-384-1
Electronic_ISBN :
1063-6927
Type :
conf
DOI :
10.1109/ICDCS.2011.16
Filename :
5961739
Link To Document :
بازگشت