• DocumentCode
    48518
  • Title

    Acceleration of the Dual-Field Domain Decomposition Algorithm Using MPI–CUDA on Large-Scale Computing Systems

  • Author

    Huan-Ting Meng ; Jian-Ming Jin

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
  • Volume
    62
  • Issue
    9
  • fYear
    2014
  • fDate
    Sept. 2014
  • Firstpage
    4706
  • Lastpage
    4715
  • Abstract
    It is well known that graphics processing units (GPUs) are able to accelerate highly parallelizable algorithms with a high speedup. However, for less-parallelizable algorithms such as the finite element method, novel schemes are needed to achieve a high speedup. In this paper, the dual-field domain decomposition (DFDD) method based on element-level decomposition (DFDD-ELD) is accelerated on a large GPU cluster. By using element-level subdomains, the DFDD-ELD computation can be easily mapped onto GPU´s granular processors and is thus highly parallelizable. Various electromagnetic problems are simulated to demonstrate the speedup and scalability of DFDD-ELD on a GPU cluster. With a careful GPU memory arrangement and thread allocation, we are able to achieve a significant speedup by utilizing GPUs in a message-passing interface (MPI)-based cluster environment. The same acceleration strategy can be applied to the acceleration of the discontinuous Galerkin time-domain (DGTD) algorithms.
  • Keywords
    Galerkin method; application program interfaces; graphics processing units; message passing; parallel architectures; DFDD method; DFDD-ELD computation; DGTD algorithms; GPU cluster; GPU granular processors; GPU memory arrangement; MPI; MPI-CUDA; acceleration strategy; discontinuous Galerkin time-domain algorithms; dual field domain decomposition algorithm; electromagnetic problems; element level decomposition; element level subdomains; finite element method; graphics processing units; large scale computing systems; message-passing interface; thread allocation; Acceleration; Algorithm design and analysis; Computer architecture; Finite element analysis; Graphics processing units; Instruction sets; Vectors; Circuit analysis; GPU cluster; compute unified device architecture (CUDA); finite-element analysis; graphics processing unit (GPU); high-performance computing; message-passing interface (MPI); multi-GPU; parallel programming; radar cross section; time-domain analysis;
  • fLanguage
    English
  • Journal_Title
    Antennas and Propagation, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-926X
  • Type

    jour

  • DOI
    10.1109/TAP.2014.2330608
  • Filename
    6832499