DocumentCode
117290
Title
Optimized three-dimensional stencil computation on Fermi and Kepler GPUs
Author
Vizitiu, Anamaria ; Itu, Lucian ; Nita, Cosmin ; Suciu, Constantin
Author_Institution
Dept. of Autom. & Inf. Technol., Transilvania Univ. of Brasov, Brasov, Romania
fYear
2014
fDate
9-11 Sept. 2014
Firstpage
1
Lastpage
6
Abstract
Stencil based algorithms are used intensively in scientific computations. Graphics Processing Units (GPU) based implementations of stencil computations speed-up the execution significantly compared to conventional CPU only systems. In this paper we focus on double precision stencil computations, which are required for meeting the high accuracy requirements, inherent for scientific computations. Starting from two baseline implementations (using two dimensional and three dimensional thread block structures respectively), we employ different optimization techniques which lead to seven kernel versions. Both Fermi and Kepler GPUs are used, to evaluate the impact of different optimization techniques for the two architectures. Overall, the GTX680 GPU card performs best for a kernel with 2D thread block structure and optimized register and shared memory usage. We show that, whereas shared memory is not essential for Fermi GPUs, it is a highly efficient optimization technique for Kepler GPUs (mainly due to the different L1 cache usage). Furthermore, we evaluate the performance of Kepler GPU cards designed for desktop PCs and notebook PCs. The results indicate that the ratio of execution time is roughly equal to the inverse of the ratio of power consumption.
Keywords
cache storage; graphics processing units; optimisation; shared memory systems; 2D thread block structure; Fermi GPU; GTX680 GPU card; Kepler GPU card; L1 cache usage; desktop PC; double precision stencil computations; graphics processing units; kernel versions; notebook PC; optimization techniques; optimized register; optimized three-dimensional stencil computation; power consumption; scientific computations; shared memory usage; stencil based algorithms; three dimensional thread block structures; two dimensional thread block structures; Computer architecture; Graphics processing units; Instruction sets; Kernel; Optimization; Registers; Three-dimensional displays; Fermi; GPU; Kepler; double precision; optimization; stencil;
fLanguage
English
Publisher
ieee
Conference_Titel
High Performance Extreme Computing Conference (HPEC), 2014 IEEE
Conference_Location
Waltham, MA
Print_ISBN
978-1-4799-6232-7
Type
conf
DOI
10.1109/HPEC.2014.7040968
Filename
7040968
Link To Document