• DocumentCode
    3588751
  • Title

    Atomic reduction based sparse matrix-transpose vector multiplication on GPUs

  • Author

    Yuan Tao ; Yangdong Deng ; Shuai Mu ; Mingfa Zhu ; Limin Xiao ; Li Ruan ; Zhibin Huang

  • Author_Institution
    State Key Lab. of Software Dev. Environ., Beihang Univ., Beijing, China
  • fYear
    2014
  • Firstpage
    987
  • Lastpage
    992
  • Abstract
    Sparse Matrix-Transpose Vector Product (SMTVP) is a frequently used computation pattern in High Performance Computing applications. It is typically solved by transposition followed by a Sparse Matrix-Vector Product (SMVP) in current linear algebra packages. However, the transposition process can be a serious bottleneck on modern parallel computing platforms. A previous work proposed a relatively complex data structure for efficiently computing SMTVP with multi-core CPUs, but it proved to be inefficient on GPUs. In this work, we show that the Compressed Sparse Row (CSR) based SMVP algorithm can also be efficient for SMTVP computation on modern GPUs. The proposed method exploits atomic operations to perform the reduce operation in the computation of each inner product of a row in the transposed matrix and the vector. Experimental results show that the simple technique can outperform the SMTVP flow of transposition plus SMVP released in the CUSPARSE package by up to 405-fold.
  • Keywords
    graphics processing units; mathematics computing; matrix multiplication; sparse matrices; CSR based SMVP algorithm; CUSPARSE package; SMTVP computation; atomic reduction; compressed sparse row; sparse matrix-transpose vector multiplication; sparse matrix-transpose vector product; transposition plus SMVP; Graphics processing units; Indexes; Instruction sets; Laboratories; Sparse matrices; Throughput; CSR; GPU; atomic operation; compressed sparse row; sparse matrix-transpose vector product;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Systems (ICPADS), 2014 20th IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/PADSW.2014.7097920
  • Filename
    7097920