DocumentCode :
3235178
Title :
3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs
Author :
Nguyen, Anthony ; Satish, Nadathur ; Chhugani, Jatin ; Kim, Changkyu ; Dubey, Pradeep
fYear :
2010
fDate :
13-19 Nov. 2010
Firstpage :
1
Lastpage :
13
Abstract :
Stencil computation sweeps over a spatial grid over multiple time steps to perform nearest-neighbor computations. The bandwidth-to-compute requirement for a large class of stencil kernels is very high, and their performance is bound by the available memory bandwidth. Since memory bandwidth grows slower than compute, the performance of stencil kernels will not scale with increasing compute density. We present a novel 3.5D-blocking algorithm that performs 2.5D-spatial and temporal blocking of the input grid into on-chip memory for both CPUs and GPUs. The resultant algorithm is amenable to both thread- level and data-level parallelism, and scales near-linearly with the SIMD width and multiple-cores. Our performance numbers are faster or comparable to state-of-the-art-stencil implementations on CPUs and GPUs. Our implementation of 7-point-stencil is 1.5X-faster on CPUs, and 1.8X faster on GPUs for single- precision floating point inputs than previously reported numbers. For Lattice Boltzmann methods, the corresponding speedup number on CPUs is 2.1X.
Keywords :
computer graphic equipment; coprocessors; lattice Boltzmann methods; optimisation; 2.5D-spatial; 3.5D blocking optimization; CPU; GPU; Lattice Boltzmann methods; SIMD; data-level parallelism; memory bandwidth; nearest-neighbor computations; on-chip memory; single- precision floating point; stencil computation sweeps; stencil kernels; temporal blocking; thread-level parallelism; Bandwidth; Graphics processing unit; Kernel; Memory management; System-on-a-chip; Three dimensional displays;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SC), 2010 International Conference for
Conference_Location :
New Orleans, LA
Print_ISBN :
978-1-4244-7557-5
Electronic_ISBN :
978-1-4244-7558-2
Type :
conf
DOI :
10.1109/SC.2010.2
Filename :
5645463
Link To Document :
بازگشت