Predicting an Optimal Sparse Matrix Format for SpMV Computation on GPU

Author

Neelima, B. ; Reddy, G.R.M. ; Raghavendra, Prakash S.

Author_Institution

Dept. of Inf. Technol., Nat. Inst. of Technol. Karnataka, Mangalore, India

fYear

2014

fDate

19-23 May 2014

Firstpage

1427

Lastpage

1436

Abstract

Many-threaded architecture based Graphics Processing Units (GPUs) are good for general purpose computations for achieving high performance. The processor has latency hiding mechanism through which it hides the memory access time in such a way that when one warp (group of 32 threads) is computing, the other warps perform memory bound access. But for memory access bound irregular applications such as Sparse Matrix Vector Multiplication (SpMV), memory access times are high and hence improving the performance of such applications on GPU is a challenging research issue. Further, optimizing SpMV time on GPU is an important task for iterative applications like jacobi and conjugate gradient. However, there is a need to consider the overheads caused while computing SpMV on GPU. Transforming the input matrix to a desired format and communicating the data from CPU to GPU are non-trivial overheads associated with SpMV computation on GPU. If the chosen format is not suitable for the given input sparse matrix then desired performance improvements cannot be achieved. Motivated by this observation, this paper proposes a method to chose an optimal sparse matrix format, focusing on the applications where CPU to GPU communication time and pre-processing time are nontrivial. The experimental results show that the predicted format by the model matches with that of the actual high performing format when total SpMV time in terms of pre-processing time, CPU to GPU communication time and SpMV computation time on GPU, is taken into account. The model predicts an optimal format for any given input sparse matrix with a very small overhead of prediction within an application. Compared to the format to achieve high performance only on GPU, our approach is more comprehensive and valuable. This paper also proposes to use a communication and pre-processing overhead optimizing sparse matrix format to be used when these overheads are non trivial.

Keywords

graphics processing units; iterative methods; matrix multiplication; multi-threading; parallel architectures; sparse matrices; CPU; GPU; SpMV computation; graphics processing unit; iterative application; latency hiding mechanism; many-threaded architecture; memory access time; optimal sparse matrix format; sparse matrix vector multiplication; Central Processing Unit; Computational modeling; Graphics processing units; Instruction sets; Kernel; Predictive models; Sparse matrices; CPU to GPU Communication; Graphics Processing Unit (GPU); Performance; Prediction; Sparse Format; Sparse Matrix Vector Multiplication (SpMV);

fLanguage

English

Publisher

ieee

Conference_Titel

Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International

Conference_Location

Phoenix, AZ

Print_ISBN

978-1-4799-4117-9

Type

conf

DOI

10.1109/IPDPSW.2014.160

Filename

6969545

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=167580