مرکز منطقه ای اطلاع رساني علوم و فناوري - Implementation and Evaluation of Triple Precision BLAS Subroutines on GPUs

DocumentCode :

3000391

Title :

Implementation and Evaluation of Triple Precision BLAS Subroutines on GPUs

Author :

Mukunoki, Daichi ; Takahashi, Daisuke

Author_Institution :

Grad. Sch. of Syst. & Inf. Eng., Univ. of Tsukuba, Tsukuba, Japan

fYear :

2012

fDate :

21-25 May 2012

Firstpage :

1378

Lastpage :

1386

Abstract :

We implemented and evaluated the triple precision Basic Linear Algebra Subprograms (BLAS) subroutines, AXPY, GEMV and GEMM on a Tesla C2050. In this paper, we present a Double Single (D+S) type triple precision floating-point value format and operations. They are based on techniques similar to Double-Double (DD) type quadruple precision operations. On the GPU, the D+S-type operations are more costly than the DD-type operations in theory and in practice. Therefore, the triple precision GEMM, which is a compute-bound operation, is slower than the quadruple precision GEMM. However, the triple precision AXPY and GEMV are memory-bound operations on the GPU, thus their execution time of these triple precision subroutines is close to 3/4 of the quadruple precision subroutines. Therefore, we conclude that the triple precision value format is useful for memory-bound operations, in cases where the quadruple precision is not required, but double precision is not sufficient.

Keywords :

floating point arithmetic; graphics processing units; linear algebra; D+S-type operation; DD-type operation; GEMV; GPU; Tesla C2050; compute-bound operation; double single type triple precision floating-point value format; double-double type quadruple precision operation; memory-bound operation; quadruple precision GEMM; triple precision AXPY; triple precision BLAS subroutine; triple precision GEMM; triple precision basic linear algebra subprogram; triple precision value format; Algorithms; Arrays; Graphics processing unit; Instruction sets; Kernel; Layout; Libraries; BLAS; GPU; triple precision;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International

Conference_Location :

Shanghai

Print_ISBN :

978-1-4673-0974-5

Type :

conf

DOI :

10.1109/IPDPSW.2012.175

Filename :

6270805

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3000391