مرکز منطقه ای اطلاع رساني علوم و فناوري - Accurate CUDA performance modeling for sparse matrix-vector multiplication

DocumentCode :

2954892

Title :

Accurate CUDA performance modeling for sparse matrix-vector multiplication

Author :

Guo, Ping ; Wang, Liqiang

Author_Institution :

Dept. of Comput. Sci., Univ. of Wyoming, Wyoming, MI, USA

fYear :

2012

fDate :

2-6 July 2012

Firstpage :

496

Lastpage :

502

Abstract :

This paper presents an integrated analytical and profile-based CUDA performance modeling approach to accurately predict the kernel execution times of sparse matrix-vector multiplication for CSR, ELL, COO, and HYB SpMV CUDA kernels. Based on our experiments conducted on a collection of 8 widely-used testing matrices on NVIDIA Tesla C2050, the execution times predicted by our model match the measured execution times of NVIDIA´s SpMV implementations very well. Specifically, for 29 out of 32 test cases, the performance differences are under or around 7%. For the rest 3 test cases, the differences are between 8% and 10%. For CSR, ELL, COO, and HYB SpMV kernels, the differences are 4.2%, 5.2%, 1.0%, and 5.7% on the average, respectively.

Keywords :

graphics processing units; parallel architectures; performance evaluation; sparse matrices; COO kernels; ELL kernels; HYB SpMV CUDA kernels; NVIDIA Tesla C2050; profile-based CUDA performance modeling approach; sparse matrix-vector multiplication; testing matrices; Analytical models; Benchmark testing; Computational modeling; Graphics processing unit; Kernel; Sparse matrices; Strips; CUDA; GPU; Performance modeling; Sparse Matrix-Vector Multiplication;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

High Performance Computing and Simulation (HPCS), 2012 International Conference on

Conference_Location :

Madrid

Print_ISBN :

978-1-4673-2359-8

Type :

conf

DOI :

10.1109/HPCSim.2012.6266964

Filename :

6266964

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2954892