• DocumentCode
    228778
  • Title

    Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster

  • Author

    Yamazaki, Ichitaro ; Rajamanickam, Sivasankaran ; Boman, Erik G. ; Hoemmen, Mark ; Heroux, Michael A. ; Tomov, Stanimire

  • Author_Institution
    Univ. of Tennessee, Knoxville, TN, USA
  • fYear
    2014
  • fDate
    16-21 Nov. 2014
  • Firstpage
    933
  • Lastpage
    944
  • Abstract
    Krylov subspace projection methods are widely used iterative methods for solving large-scale linear systems of equations. Researchers have demonstrated that communication avoiding (CA) techniques can improve Krylov methods´ performance on modern computers, where communication is becoming increasingly expensive compared to arithmetic operations. In this paper, we extend these studies by two major contributions. First, we present our implementation of a CA variant of the Generalized Minimum Residual (GMRES) method, called CAGMRES, for solving no symmetric linear systems of equations on a hybrid CPU/GPU cluster. Our performance results on up to 120 GPUs show that CA-GMRES gives a speedup of up to 2.5x in total solution time over standard GMRES on a hybrid cluster with twelve Intel Xeon CPUs and three Nvidia Fermi GPUs on each node. We then outline a domain decomposition framework to introduce a family of preconditioners that are suitable for CA Krylov methods. Our preconditioners do not incur any additional communication and allow the easy reuse of existing algorithms and software for the sub domain solves. Experimental results on the hybrid CPU/GPU cluster demonstrate that CA-GMRES with preconditioning achieve a speedup of up to 7.4x over CAGMRES without preconditioning, and speedup of up to 1.7x over GMRES with preconditioning in total solution time. These results confirm the potential of our framework to develop a practical and effective preconditioned CA Krylov method.
  • Keywords
    graphics processing units; iterative methods; mathematics computing; CA techniques; CAGMRES; Intel Xeon CPU; Krylov subspace projection methods; Nvidia Fermi GPU; communication-avoiding Krylov method; domain decomposition preconditioners; generalized minimum residual method; hybrid CPU-GPU cluster; iterative methods; large-scale linear systems of equations; Central Processing Unit; Graphics processing units; Jacobian matrices; Kernel; Linear systems; Sparse matrices; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for
  • Conference_Location
    New Orleans, LA
  • Print_ISBN
    978-1-4799-5499-5
  • Type

    conf

  • DOI
    10.1109/SC.2014.81
  • Filename
    7013063