DocumentCode
167580
Title
Predicting an Optimal Sparse Matrix Format for SpMV Computation on GPU
Author
Neelima, B. ; Reddy, G.R.M. ; Raghavendra, Prakash S.
Author_Institution
Dept. of Inf. Technol., Nat. Inst. of Technol. Karnataka, Mangalore, India
fYear
2014
fDate
19-23 May 2014
Firstpage
1427
Lastpage
1436
Abstract
Many-threaded architecture based Graphics Processing Units (GPUs) are good for general purpose computations for achieving high performance. The processor has latency hiding mechanism through which it hides the memory access time in such a way that when one warp (group of 32 threads) is computing, the other warps perform memory bound access. But for memory access bound irregular applications such as Sparse Matrix Vector Multiplication (SpMV), memory access times are high and hence improving the performance of such applications on GPU is a challenging research issue. Further, optimizing SpMV time on GPU is an important task for iterative applications like jacobi and conjugate gradient. However, there is a need to consider the overheads caused while computing SpMV on GPU. Transforming the input matrix to a desired format and communicating the data from CPU to GPU are non-trivial overheads associated with SpMV computation on GPU. If the chosen format is not suitable for the given input sparse matrix then desired performance improvements cannot be achieved. Motivated by this observation, this paper proposes a method to chose an optimal sparse matrix format, focusing on the applications where CPU to GPU communication time and pre-processing time are nontrivial. The experimental results show that the predicted format by the model matches with that of the actual high performing format when total SpMV time in terms of pre-processing time, CPU to GPU communication time and SpMV computation time on GPU, is taken into account. The model predicts an optimal format for any given input sparse matrix with a very small overhead of prediction within an application. Compared to the format to achieve high performance only on GPU, our approach is more comprehensive and valuable. This paper also proposes to use a communication and pre-processing overhead optimizing sparse matrix format to be used when these overheads are non trivial.
Keywords
graphics processing units; iterative methods; matrix multiplication; multi-threading; parallel architectures; sparse matrices; CPU; GPU; SpMV computation; graphics processing unit; iterative application; latency hiding mechanism; many-threaded architecture; memory access time; optimal sparse matrix format; sparse matrix vector multiplication; Central Processing Unit; Computational modeling; Graphics processing units; Instruction sets; Kernel; Predictive models; Sparse matrices; CPU to GPU Communication; Graphics Processing Unit (GPU); Performance; Prediction; Sparse Format; Sparse Matrix Vector Multiplication (SpMV);
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
Conference_Location
Phoenix, AZ
Print_ISBN
978-1-4799-4117-9
Type
conf
DOI
10.1109/IPDPSW.2014.160
Filename
6969545
Link To Document