DocumentCode :
656134
Title :
AdELL: An Adaptive Warp-Balancing ELL Format for Efficient Sparse Matrix-Vector Multiplication on GPUs
Author :
Maggioni, Matteo ; Berger-Wolf, Tanya
Author_Institution :
Dept. of Comput. Sci., Univ. of Illinois at Chicago, Chicago, IL, USA
fYear :
2013
fDate :
1-4 Oct. 2013
Firstpage :
11
Lastpage :
20
Abstract :
The sparse matrix-vector multiplication (SpMV) is a fundamental computational kernel used in science and engineering. As a result, the performance of a large number of applications depends on the efficiency of the SpMV. This kernel is, in fact, a bandwidth-limited operation and poses a challenge for optimization when the matrix has an irregular structure. The literature on implementing SpMV on throughput-oriented many core processors is extensive and mostly focuses on matrix formats, proposing different ideas to adapt matrix sparsity to the underlying architecture. In this paper, we propose a novel ELL-based matrix format called Adaptive ELL (AdELL) to improve the state-of-the-art of the SpMV on Graphic Processing Units (GPUs). The AdELL format is based on the idea of distributing working threads to rows according to their computational load, creating balanced hardware-level blocks (warps) that take full advantage of the vectorized execution on Streaming Multiprocessors (SMs). The AdELL data structure is created using a novel warp-balancing heuristic designed to smooth the workload among warps without the need of tuning any parameters. AdELL provides an efficient warp-level synchronization (as opposed to block-level) but can also use atomic operations to distribute very skewed rows over multiple warps. Moreover, we introduce a loop unrolling heuristic that optimizes the SpMV performance by selecting the best unrolling factor based on the warp workload. We tested the proposed AdELL sparse format on a set of conventional benchmarks from heterogeneous application domains. The results show substantial and consistent performance improvements for double-precision calculations, outperforming the state-of-the-art ensemble framework clSpMV. We could observe speedup peaks up to 1.94 and a 25% (geometric) average improvement, which can be potentially increased to 43% introducing a simple 1x2 blocking strategy.
Keywords :
data structures; graphics processing units; mathematics computing; matrix decomposition; multiprocessing systems; synchronisation; AdELL data structure; GPUs; SMs; SpMV; adaptive ELL; adaptive warp-balancing ELL matrix format; balanced hardware-level blocks; computational kernel; graphic processing units; sparse matrix-vector multiplication; streaming multiprocessors; throughput-oriented many-core processors; warp-balancing heuristic; warp-level synchronization; Data structures; Graphics processing units; Instruction sets; Optimization; Parallel processing; Sparse matrices; Adaptive; ELL; GPU; Sparse Matrix Vector Multiplication; Warp-Balancing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing (ICPP), 2013 42nd International Conference on
Conference_Location :
Lyon
ISSN :
0190-3918
Type :
conf
DOI :
10.1109/ICPP.2013.10
Filename :
6687334
Link To Document :
بازگشت