DocumentCode :
40777
Title :
GPU-Accelerated Sparse LU Factorization for Circuit Simulation with Performance Modeling
Author :
Xiaoming Chen ; Ling Ren ; Yu Wang ; Huazhong Yang
Author_Institution :
Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
Volume :
26
Issue :
3
fYear :
2015
fDate :
Mar-15
Firstpage :
786
Lastpage :
795
Abstract :
The sparse matrix solver by LU factorization is a serious bottleneck in Simulation Program with Integrated Circuit Emphasis (SPICE)-based circuit simulators. The state-of-the-art Graphics Processing Units (GPU) have numerous cores sharing the same memory, provide attractive memory bandwidth and compute capability, and support massive thread-level parallelism, so GPUs can potentially accelerate the sparse solver in circuit simulators. In this paper, an efficient GPU-based sparse solver for circuit problems is proposed. We develop a hybrid parallel LU factorization approach combining task-level and data-level parallelism on GPUs. Work partitioning, number of active thread groups, and memory access patterns are optimized based on the GPU architecture. Experiments show that the proposed LU factorization approach on NVIDIA GTX580 attains an average speedup of 7.02× (geometric mean) compared with sequential PARDISO, and 1.55× compared with 16-threaded PARDISO. We also investigate bottlenecks of the proposed approach by a parametric performance model. The performance of the sparse LU factorization on GPUs is constrained by the global memory bandwidth, so the performance can be further improved by future GPUs with larger memory bandwidth.
Keywords :
SPICE; circuit simulation; graphics processing units; parallel programming; GPU-accelerated sparse Lu factorization; GPU-based sparse solver; NVIDIA GTX580 GPU; SPICE-based circuit simulators; graphics processing unit; hybrid parallel Lu factorization approach; massive thread-level parallelism; memory bandwidth; performance modeling; sequential PARDISO; simulation program with integrated circuit emphasis; sixteen-threaded PARDISO; Bandwidth; Graphics processing units; Instruction sets; Integrated circuit modeling; Parallel processing; Sparse matrices; Virtual groups; Graphics processing unit; circuit simulation; parallel sparse LU factorization; performance model;
fLanguage :
English
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9219
Type :
jour
DOI :
10.1109/TPDS.2014.2312199
Filename :
6774937
Link To Document :
بازگشت