Title :
Accurate CUDA performance modeling for sparse matrix-vector multiplication
Author :
Guo, Ping ; Wang, Liqiang
Author_Institution :
Dept. of Comput. Sci., Univ. of Wyoming, Wyoming, MI, USA
Abstract :
This paper presents an integrated analytical and profile-based CUDA performance modeling approach to accurately predict the kernel execution times of sparse matrix-vector multiplication for CSR, ELL, COO, and HYB SpMV CUDA kernels. Based on our experiments conducted on a collection of 8 widely-used testing matrices on NVIDIA Tesla C2050, the execution times predicted by our model match the measured execution times of NVIDIA´s SpMV implementations very well. Specifically, for 29 out of 32 test cases, the performance differences are under or around 7%. For the rest 3 test cases, the differences are between 8% and 10%. For CSR, ELL, COO, and HYB SpMV kernels, the differences are 4.2%, 5.2%, 1.0%, and 5.7% on the average, respectively.
Keywords :
graphics processing units; parallel architectures; performance evaluation; sparse matrices; COO kernels; ELL kernels; HYB SpMV CUDA kernels; NVIDIA Tesla C2050; profile-based CUDA performance modeling approach; sparse matrix-vector multiplication; testing matrices; Analytical models; Benchmark testing; Computational modeling; Graphics processing unit; Kernel; Sparse matrices; Strips; CUDA; GPU; Performance modeling; Sparse Matrix-Vector Multiplication;
Conference_Titel :
High Performance Computing and Simulation (HPCS), 2012 International Conference on
Conference_Location :
Madrid
Print_ISBN :
978-1-4673-2359-8
DOI :
10.1109/HPCSim.2012.6266964