DocumentCode :
1914100
Title :
Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs
Author :
Matsumoto, Kaname ; Nakasato, N. ; Sedukhin, Stanislav G.
Author_Institution :
Grad. Sch. of Comput. Sci. & Eng., Univ. of Aizu, Aizu-Wakamatsu, Japan
fYear :
2012
fDate :
10-16 Nov. 2012
Firstpage :
396
Lastpage :
405
Abstract :
OpenCL (Open Computing Language) is a framework for general-purpose parallel programming. Programs written in OpenCL are functionally portable across multiple processors including CPUs, GPUs, and also FPGAs. Using an auto-tuning technique makes performance of OpenCL programs also portable on different processors. We have developed an auto-tuning system with a code generator for fast matrix multiply kernels in OpenCL. This paper presents results of performance evaluation of DGEMM (double-precision general matrix multiply) and SGEMM (single-precision GEMM) implementations by using the auto-tuning system. Performance evaluations are conducted on two AMD GPUs (Tahiti and Cayman), two NVIDIA GPUs (Kepler and Fermi), and two CPUs (Intel Sandy Bridge and AMD Bulldozer). Our GEMM implementations on the AMD GPUs show higher performance than the highly tuned vendor library while the implementations on the NVIDIA GPUs are comparable.
Keywords :
graphics processing units; matrix multiplication; parallel programming; public domain software; software performance evaluation; software portability; AMD Bulldozer CPU; Cayman AMD GPU; DGEMM performance evaluation; FPGA; Fermi NVIDIA GPU; Intel Sandy Bridge CPU; Kepler NVIDIA GPU; Open Computing Language framework; OpenCL program performance evaluation; SGEMM performance evaluation; Tahiti AMD GPU; autotuning technique; code generator; double-precision general matrix multiply kernels; general-purpose parallel programming; matrix multiplication performance tuning; program portability; single-precision GEMM; GPU; OpenCL; auto-tuning; dense linear algebra; matrix multiplication;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:
Conference_Location :
Salt Lake City, UT
Print_ISBN :
978-1-4673-6218-4
Type :
conf
DOI :
10.1109/SC.Companion.2012.59
Filename :
6495841
Link To Document :
بازگشت