Title :
Implementation and Evaluation of Triple Precision BLAS Subroutines on GPUs
Author :
Mukunoki, Daichi ; Takahashi, Daisuke
Author_Institution :
Grad. Sch. of Syst. & Inf. Eng., Univ. of Tsukuba, Tsukuba, Japan
Abstract :
We implemented and evaluated the triple precision Basic Linear Algebra Subprograms (BLAS) subroutines, AXPY, GEMV and GEMM on a Tesla C2050. In this paper, we present a Double Single (D+S) type triple precision floating-point value format and operations. They are based on techniques similar to Double-Double (DD) type quadruple precision operations. On the GPU, the D+S-type operations are more costly than the DD-type operations in theory and in practice. Therefore, the triple precision GEMM, which is a compute-bound operation, is slower than the quadruple precision GEMM. However, the triple precision AXPY and GEMV are memory-bound operations on the GPU, thus their execution time of these triple precision subroutines is close to 3/4 of the quadruple precision subroutines. Therefore, we conclude that the triple precision value format is useful for memory-bound operations, in cases where the quadruple precision is not required, but double precision is not sufficient.
Keywords :
floating point arithmetic; graphics processing units; linear algebra; D+S-type operation; DD-type operation; GEMV; GPU; Tesla C2050; compute-bound operation; double single type triple precision floating-point value format; double-double type quadruple precision operation; memory-bound operation; quadruple precision GEMM; triple precision AXPY; triple precision BLAS subroutine; triple precision GEMM; triple precision basic linear algebra subprogram; triple precision value format; Algorithms; Arrays; Graphics processing unit; Instruction sets; Kernel; Layout; Libraries; BLAS; GPU; triple precision;
Conference_Titel :
Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International
Conference_Location :
Shanghai
Print_ISBN :
978-1-4673-0974-5
DOI :
10.1109/IPDPSW.2012.175