DocumentCode :
3538032
Title :
A Trip to Tahiti: Approaching a 5 TFlop SGEMM Using 3 AMD GPUs
Author :
Weber, Rick ; Peterson, Gregory D.
Author_Institution :
Dept. of EECS, Univ. of Tennessee, Knoxville, TN, USA
fYear :
2012
fDate :
10-11 July 2012
Firstpage :
19
Lastpage :
25
Abstract :
Using GPUs as computational accelerators has been a growing area of research in the past several years. One particular area amenable to exploiting video card hardware is dense linear algebra. We continue this trend by generalizing the MAGMA xGEMM kernels, porting them to OpenCL and tuning them to run on the AMD 7970. Achieving up to 1.7 TFlops in SGEMM and 650 GFlops in DGEMM, we extend this performance to multiple GPUs using a parallel-for algorithm designed to run on multiple heterogeneous devices. Using 3 Radeon 7970s, our large GEMM algorithm obtains 4.37TFlops in single precision and 1.64 TFlops/s in double.
Keywords :
graphics processing units; matrix algebra; AMD 7970; AMD GPU; DGEMM; MAGMA xGEMM kernels; OpenCL; Radeon 7970; TFlop SGEMM; computational accelerators; dense linear algebra; double-precision general matrix multiplication; multiple heterogeneous devices; parallel-for algorithm; single-precision general matrix multiplication; video card hardware; Computer architecture; Graphics processing unit; Indexes; Kernel; Vectors; BLAS; GEMM; GPU; OpenCL; matrix multiply;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Application Accelerators in High Performance Computing (SAAHPC), 2012 Symposium on
Conference_Location :
Chicago IL
ISSN :
2166-5133
Print_ISBN :
978-1-4673-2882-1
Type :
conf
DOI :
10.1109/SAAHPC.2012.19
Filename :
6319187
Link To Document :
بازگشت