DocumentCode :
3697077
Title :
Vector Folding: Improving Stencil Performance via Multi-dimensional SIMD-vector Representation
Author :
Charles Yount
fYear :
2015
Firstpage :
865
Lastpage :
870
Abstract :
Stencil computation is an important class of algorithms used in a large variety of scientific-simulation applications. Modern CPUs are employing increasingly longer SIMD vector registers and operations to improve computational throughput. However, the traditional use of vectors to contain sequential data elements along one dimension is not always the most efficient representation, especially in the multicore and hyper-threaded context where caches are shared among many simultaneous compute streams. This paper presents a general technique for representing data in vectors for 2D and 3D stencils. This method reduces the number of memory accesses required by storing a small multi-dimensional block of data in each vector compared to the single dimension in the traditional approach. Experiments on an Intel Xeon Phi Coprocessor show performance speedups over traditional vectors ranging from 1.2x to 2.7x, depending on the problem size and stencil type. This technique is independent of and complementary to a variety of existing stencil-computation tuning algorithms such as cache blocking, loop tiling, and wavefront parallelization.
Keywords :
"Three-dimensional displays","Shape","Layout","Memory management","Jacobian matrices","Registers"
Publisher :
ieee
Conference_Titel :
High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conferen on Embedded Software and Systems (ICESS), 2015 IEEE 17th International Conference on
Type :
conf
DOI :
10.1109/HPCC-CSS-ICESS.2015.27
Filename :
7336272
Link To Document :
بازگشت