• DocumentCode
    3000391
  • Title

    Implementation and Evaluation of Triple Precision BLAS Subroutines on GPUs

  • Author

    Mukunoki, Daichi ; Takahashi, Daisuke

  • Author_Institution
    Grad. Sch. of Syst. & Inf. Eng., Univ. of Tsukuba, Tsukuba, Japan
  • fYear
    2012
  • fDate
    21-25 May 2012
  • Firstpage
    1378
  • Lastpage
    1386
  • Abstract
    We implemented and evaluated the triple precision Basic Linear Algebra Subprograms (BLAS) subroutines, AXPY, GEMV and GEMM on a Tesla C2050. In this paper, we present a Double Single (D+S) type triple precision floating-point value format and operations. They are based on techniques similar to Double-Double (DD) type quadruple precision operations. On the GPU, the D+S-type operations are more costly than the DD-type operations in theory and in practice. Therefore, the triple precision GEMM, which is a compute-bound operation, is slower than the quadruple precision GEMM. However, the triple precision AXPY and GEMV are memory-bound operations on the GPU, thus their execution time of these triple precision subroutines is close to 3/4 of the quadruple precision subroutines. Therefore, we conclude that the triple precision value format is useful for memory-bound operations, in cases where the quadruple precision is not required, but double precision is not sufficient.
  • Keywords
    floating point arithmetic; graphics processing units; linear algebra; D+S-type operation; DD-type operation; GEMV; GPU; Tesla C2050; compute-bound operation; double single type triple precision floating-point value format; double-double type quadruple precision operation; memory-bound operation; quadruple precision GEMM; triple precision AXPY; triple precision BLAS subroutine; triple precision GEMM; triple precision basic linear algebra subprogram; triple precision value format; Algorithms; Arrays; Graphics processing unit; Instruction sets; Kernel; Layout; Libraries; BLAS; GPU; triple precision;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4673-0974-5
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2012.175
  • Filename
    6270805