DocumentCode :
3575039
Title :
LU Factorization of Small Matrices: Accelerating Batched DGETRF on the GPU
Author :
Tingxing Dong ; Haidar, Azzam ; Luszczek, Piotr ; Harris, James Austin ; Tomov, Stanimire ; Dongarra, Jack
Author_Institution :
Univ. of Tennesse, Knoxville, TN, USA
fYear :
2014
Firstpage :
157
Lastpage :
160
Abstract :
Gaussian Elimination is commonly used to solve dense linear systems in scientific models. In a large number of applications, a need arises to solve many small size problems, instead of few large linear systems. The size of each of these small linear systems depends on the number of the ordinary differential equations (ODEs) used in the model, and can be on the order of hundreds of unknowns. To efficiently exploit the computing power of modern accelerator hardware, these linear systems are processed in batches. To improve the numerical stability, at least partial pivoting is required, most often accomplished with row pivoting. However, row pivoting can result in a severe performance penalty on GPUs because it brings in thread divergence and non-coalesced memory accesses. In this paper, we propose a batched LU factorization for GPUs by using amulti-level blocked right looking algorithm that preserves the data layout but minimizes the penalty of partial pivoting. Our batched LU achieves up to 2.5-fold speedup when compared to the alternative CUBLAS solution on a K40c GPU.
Keywords :
differential equations; graphics processing units; linear systems; CUBLAS solution; Gaussian elimination; K40c GPU; ODE; batched DGETRF; batched LU factorization; computing power; data layout; dense linear systems; least partial pivoting; modern accelerator hardware; multilevel blocked right looking algorithm; noncoalesced memory accesses; numerical stability; ordinary differential equations; row pivoting; scientific models; small matrices; thread divergence; Graphics processing units; Instruction sets; Kernel; Linear systems; Optimization; Parallel processing; Vectors; GPU; Gaussian Elimination; batched;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), 2014 IEEE Intl Conf on
Print_ISBN :
978-1-4799-6122-1
Type :
conf
DOI :
10.1109/HPCC.2014.30
Filename :
7056733
Link To Document :
بازگشت