Title :
Implementation of parallel sparse Cholesky factorization on GPU
Author :
Dan Zou ; Yong Dou
Author_Institution :
Nat. Lab. for Parallel & Distrib. Process., Nat. Univ. of Defense Technol., Changsha, China
Abstract :
Direct methods for solving large sparse symmetric positive-definite linear systems of equations are popular because of their generality and robustness. The main bottleneck is the sparse Cholesky factorization, which exhibits irregular memory access behavior and unbalanced workload. In the past 10 years, many sparse Cholesky factorization algorithms have emerged, exploiting new architectural features. However, programming techniques currently employed on these platforms are not sufficient to implement sparse Cholesky factorization on many-core graphics processing units (GPUs) due to mismatches between irregular problem structures and single-instruction multiple-thread GPU architectures. In the present paper, we propose a task-based software approach for the parallel sparse Cholesky factorization aimed at heterogeneous computing platforms with GPU accelerators. The tasks are generated by CPU. An efficient task-scheduling mechanism guarantees the correct ordering of task execution and ensures a load balanced execution on GPU. Comparisons are made with the existing solver using problems arising from a range of practical applications. The experiment results show that the proposed approach can substantially improve the performance of sparse Cholesky factorization on GPU with 2.7×-4× speedup.
Keywords :
graphics processing units; linear systems; mathematics computing; matrix decomposition; parallel programming; scheduling; task analysis; GPU accelerators; architectural features; heterogeneous computing platforms; irregular memory access behavior; irregular problem structures; large sparse symmetric positive-definite linear systems; load balanced execution; many-core graphics processing units; parallel sparse Cholesky factorization algorithm; programming techniques; single-instruction multiple-thread GPU architectures; task execution ordering; task-based software approach; task-scheduling mechanism; unbalanced workload; GPU; sparse Cholesky factorization;
Conference_Titel :
Computer Science and Network Technology (ICCSNT), 2012 2nd International Conference on
Conference_Location :
Changchun
Print_ISBN :
978-1-4673-2963-7
DOI :
10.1109/ICCSNT.2012.6526361