• DocumentCode
    40777
  • Title

    GPU-Accelerated Sparse LU Factorization for Circuit Simulation with Performance Modeling

  • Author

    Xiaoming Chen ; Ling Ren ; Yu Wang ; Huazhong Yang

  • Author_Institution
    Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
  • Volume
    26
  • Issue
    3
  • fYear
    2015
  • fDate
    Mar-15
  • Firstpage
    786
  • Lastpage
    795
  • Abstract
    The sparse matrix solver by LU factorization is a serious bottleneck in Simulation Program with Integrated Circuit Emphasis (SPICE)-based circuit simulators. The state-of-the-art Graphics Processing Units (GPU) have numerous cores sharing the same memory, provide attractive memory bandwidth and compute capability, and support massive thread-level parallelism, so GPUs can potentially accelerate the sparse solver in circuit simulators. In this paper, an efficient GPU-based sparse solver for circuit problems is proposed. We develop a hybrid parallel LU factorization approach combining task-level and data-level parallelism on GPUs. Work partitioning, number of active thread groups, and memory access patterns are optimized based on the GPU architecture. Experiments show that the proposed LU factorization approach on NVIDIA GTX580 attains an average speedup of 7.02× (geometric mean) compared with sequential PARDISO, and 1.55× compared with 16-threaded PARDISO. We also investigate bottlenecks of the proposed approach by a parametric performance model. The performance of the sparse LU factorization on GPUs is constrained by the global memory bandwidth, so the performance can be further improved by future GPUs with larger memory bandwidth.
  • Keywords
    SPICE; circuit simulation; graphics processing units; parallel programming; GPU-accelerated sparse Lu factorization; GPU-based sparse solver; NVIDIA GTX580 GPU; SPICE-based circuit simulators; graphics processing unit; hybrid parallel Lu factorization approach; massive thread-level parallelism; memory bandwidth; performance modeling; sequential PARDISO; simulation program with integrated circuit emphasis; sixteen-threaded PARDISO; Bandwidth; Graphics processing units; Instruction sets; Integrated circuit modeling; Parallel processing; Sparse matrices; Virtual groups; Graphics processing unit; circuit simulation; parallel sparse LU factorization; performance model;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2014.2312199
  • Filename
    6774937