Title :
Accelerating the general band matrix multiplication using graphics processors
Author :
Benner, Peter ; Remon, Alfredo ; Dufrechou, Ernesto ; Ezzatti, Pablo ; Quintana-Orti, Enrique S.
Author_Institution :
Max Planck Inst. for Dynamics of Complex Tech. Syst., Magdeburg, Germany
Abstract :
In this paper, we leverage the intrinsic data-parallelism of the band matrix-matrix product to accelerate this operation on Graphics Processing Units (GPUs). In particular, we propose a Level-3 BLAS style algorithm to tackle the band matrix-matrix product and implement two GPU-based versions that off-load the most expensive computations - i.e., general dense matrix-matrix multiplication, triangular matrixmatrix multiplication and matrix addition - to the hardware accelerator. Results collected using GPUs for the two most recent generations of NVIDIA (“Fermi” and “Kepler”) and a complete set of benchmark cases (which differ in the matrix dimensions and bandwidth) show that the GPU-enabled implementations deliver a notable reduction of the execution time.
Keywords :
graphics processing units; mathematics computing; matrix multiplication; Fermi generations; GPU-based versions; GPU-enabled implementations; Kepler generations; Level-3 BLAS style algorithm; NVIDIA; band matrix-matrix product; general band matrix multiplication; general dense matrix-matrix multiplication; graphics processing units; graphics processors; hardware accelerator; intrinsic data-parallelism; matrix addition; triangular matrix-matrix multiplication; Acceleration; Bandwidth; Graphics processing units; Hardware; Kernel; Partitioning algorithms; Sparse matrices; BLAS; GPU; General Band Matrix Multiplication; LA-PACK;
Conference_Titel :
Computing Conference (CLEI), 2014 XL Latin American
Conference_Location :
Montevideo
DOI :
10.1109/CLEI.2014.6965142