Title :
A Trip to Tahiti: Approaching a 5 TFlop SGEMM Using 3 AMD GPUs
Author :
Weber, Rick ; Peterson, Gregory D.
Author_Institution :
Dept. of EECS, Univ. of Tennessee, Knoxville, TN, USA
Abstract :
Using GPUs as computational accelerators has been a growing area of research in the past several years. One particular area amenable to exploiting video card hardware is dense linear algebra. We continue this trend by generalizing the MAGMA xGEMM kernels, porting them to OpenCL and tuning them to run on the AMD 7970. Achieving up to 1.7 TFlops in SGEMM and 650 GFlops in DGEMM, we extend this performance to multiple GPUs using a parallel-for algorithm designed to run on multiple heterogeneous devices. Using 3 Radeon 7970s, our large GEMM algorithm obtains 4.37TFlops in single precision and 1.64 TFlops/s in double.
Keywords :
graphics processing units; matrix algebra; AMD 7970; AMD GPU; DGEMM; MAGMA xGEMM kernels; OpenCL; Radeon 7970; TFlop SGEMM; computational accelerators; dense linear algebra; double-precision general matrix multiplication; multiple heterogeneous devices; parallel-for algorithm; single-precision general matrix multiplication; video card hardware; Computer architecture; Graphics processing unit; Indexes; Kernel; Vectors; BLAS; GEMM; GPU; OpenCL; matrix multiply;
Conference_Titel :
Application Accelerators in High Performance Computing (SAAHPC), 2012 Symposium on
Conference_Location :
Chicago IL
Print_ISBN :
978-1-4673-2882-1
DOI :
10.1109/SAAHPC.2012.19