DocumentCode :
1414082
Title :
Autotuning GEMM Kernels for the Fermi GPU
Author :
Kurzak, Jakub ; Tomov, Stanimire ; Dongarra, Jack
Author_Institution :
Electr. Eng. & Comput. Sci. Dept., Univ. of Tennessee, Knoxville, TN, USA
Volume :
23
Issue :
11
fYear :
2012
Firstpage :
2045
Lastpage :
2057
Abstract :
In recent years, the use of graphics chips has been recognized as a viable way of accelerating scientific and engineering applications, even more so since the introduction of the Fermi architecture by NVIDIA, with features essential to numerical computing, such as fast double precision arithmetic and memory protected with error correction codes. Being the crucial component of numerical software packages, such as LAPACK and ScaLAPACK, the general dense matrix multiplication routine is one of the more important workloads to be implemented on these devices. This paper presents a methodology for producing matrix multiplication kernels tuned for a specific architecture, through a canonical process of heuristic autotuning, based on generation of multiple code variants and selecting the fastest ones through benchmarking. The key contribution of this work is in the method for generating the search space; specifically, pruning it to a manageable size. Performance numbers match or exceed other available implementations.
Keywords :
graphics processing units; matrix multiplication; search problems; software packages; Fermi GPU; Fermi architecture; NVIDIA; ScaLAPACK; autotuning GEMM kernels; benchmarking; canonical process; engineering applications; error correction codes; fast double precision arithmetic; general dense matrix multiplication routine; graphics chips; graphics processing unit; heuristic autotuning; matrix multiplication kernels; multiple code variants; numerical computing; numerical software packages; scientific applications; search space; Computer architecture; Graphics processing unit; Hardware; Instruction sets; Kernel; Registers; BLAS; CUDA; GEMM; Graphics processing unit; automatic tuning; code generation; matrix multiplication;
fLanguage :
English
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9219
Type :
jour
DOI :
10.1109/TPDS.2011.311
Filename :
6122021
Link To Document :
بازگشت