DocumentCode :
2061709
Title :
Sparse Matrix-Vector Multiplication Optimizations based on Matrix Bandwidth Reduction using NVIDIA CUDA
Author :
Xu, Shiming ; Lin, Hai Xiang ; Xue, Wei
Author_Institution :
Delft Inst. of Appl. Math., Tech. Univ. Delft, Delft, Netherlands
fYear :
2010
fDate :
10-12 Aug. 2010
Firstpage :
609
Lastpage :
614
Abstract :
In this paper we propose the optimization of sparse matrix-vector multiplication (SpMV) with CUDA based on matrix bandwidth/profile reduction techniques. Computational time required to access dense vector is decoupled from SpMV computation. By reducing the matrix profile, the time required to access dense vector is reduced by 17% (for SP) and 24% (for DP). Reduced matrix bandwidth enables column index information compression with shorter formats, resulting in a 17% (for SP) and 10% (for DP) execution time reduction for accessing matrix data under ELLPACK format. The overall speedup for SpMV is 16% and 12.6% for the whole matrix test suite. The optimization proposed in this paper can be combined with other SpMV optimizations such as register blocking.
Keywords :
mathematics computing; matrix multiplication; operating system kernels; optimisation; parallel architectures; sparse matrices; vectors; ELLPACK format; NVIDIA CUDA; SpMV computation; matrix bandwidth reduction; register blocking; sparse matrix vector multiplication optimizations; Bandwidth; Finite element methods; Indexes; Instruction sets; Optimization; Proteins; Sparse matrices;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Distributed Computing and Applications to Business Engineering and Science (DCABES), 2010 Ninth International Symposium on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4244-7539-1
Type :
conf
DOI :
10.1109/DCABES.2010.162
Filename :
5571530
Link To Document :
بازگشت