Title :
Optimizing Sparse Matrix Vector Multiplication Using Cache Blocking Method on Fermi GPU
Author :
Xu, Weizhi ; Zhang, Hao ; Jiao, Shuai ; Wang, Da ; Song, Fenglong ; Liu, Zhiyong
Author_Institution :
Key Lab. of Comput. Syst. & Archit., Inst. of Comput. Technol., Beijing, China
Abstract :
It is an important task to tune performance for sparse matrix vector multiplication (SpMV), but it is also a difficult task because of its irregularity. In this paper, we propose a cache blocking method to improve the performance of SpMV on the emerging GPU architecture. The sparse matrix is partitioned into many sub-blocks, which are stored in CSR format. With the blocking method, the corresponding part of vector x can be reused in the GPU cache, so the time spent on accessing the global memory for vector x is reduced heavily. Experimental results on GeForce GTX 480 show that SpMV kernel with the cache blocking method is 5x faster than the unblocked CSR kernel in the best case.
Keywords :
cache storage; digital arithmetic; graphics processing units; matrix multiplication; optimisation; parallel architectures; sparse matrices; CSR format; Fermi GPU; GPU architecture; GeForce GTX 480; SpMV; cache blocking method; sparse matrix vector multiplication optimization; unblocked CSR kernel; Bandwidth; Computer architecture; Graphics processing unit; Instruction sets; Kernel; Sparse matrices; Vectors; GPU; SpMV; cache blocking;
Conference_Titel :
Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD), 2012 13th ACIS International Conference on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4673-2120-4
DOI :
10.1109/SNPD.2012.20