• DocumentCode
    65029
  • Title

    A Family of Bit-Representation-Optimized Formats for Fast Sparse Matrix-Vector Multiplication on the GPU

  • Author

    Wai Teng Tang ; Wen Jun Tan ; Goh, Rick Siow Mong ; Turner, Stephen John ; Weng-Fai Wong

  • Author_Institution
    Inst. of High-Performance Comput., Agency for Sci., Technol. & Res., Singapore, Singapore
  • Volume
    26
  • Issue
    9
  • fYear
    2015
  • fDate
    Sept. 1 2015
  • Firstpage
    2373
  • Lastpage
    2385
  • Abstract
    Sparse matrix-vector multiplication (SpMV) is an important kernel that is used in many iterative algorithms for solving scientific and engineering problems. One of the main challenges of SpMV is its memory-boundedness due to the low arithmetic intensity of the kernel. Although compression has been proposed previously to improve SpMV performance on CPUs, its use has not been demonstrated on the GPU because of the serial nature of many compression and decompression schemes. In this paper, we introduce a family of bit-representation-optimized (BRO) compression formats for representing sparse matrices on GPUs. The proposed formats - BRO-CSR, BRO-ELL and BRO-HYB, perform compression on index data and help to speed up SpMV on GPUs through the reduction of memory traffic. We also propose two other hybrid BRO formats which can potentially perform better than both HYB and BRO-HYB formats. Experimental results demonstrate that compared to uncompressed CSR and ELLPACK formats, our proposed compressed BRO-CSR and BRO-ELL formats are able to achieve average speedups of 2× and 1.4× respectively. Furthermore, we demonstrate that by using BRO-ELL, the preconditioned conjugate gradient method is able to achieve an average speedup of 1.3× over ELLPACK.
  • Keywords
    graphics processing units; iterative methods; sparse matrices; BRO compression formats; CPU; GPU; SpMV performance; arithmetic intensity; bit representation optimized; bit representation optimized formats; fast sparse matrix vector multiplication; iterative algorithms; memory traffic; Arrays; Graphics processing units; Indexes; Instruction sets; Kernel; Sparse matrices; GPU; Sparse matrix format; data compression; matrix-vector multiplication; memory bandwidth; parallelism;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2014.2357437
  • Filename
    6895301