Acceleration of the Dual-Field Domain Decomposition Algorithm Using MPI–CUDA on Large-Scale Computing Systems

Author

Huan-Ting Meng ; Jian-Ming Jin

Author_Institution

Dept. of Electr. & Comput. Eng., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA

Volume

62

Issue

9

fYear

2014

fDate

Sept. 2014

Firstpage

4706

Lastpage

4715

Abstract

It is well known that graphics processing units (GPUs) are able to accelerate highly parallelizable algorithms with a high speedup. However, for less-parallelizable algorithms such as the finite element method, novel schemes are needed to achieve a high speedup. In this paper, the dual-field domain decomposition (DFDD) method based on element-level decomposition (DFDD-ELD) is accelerated on a large GPU cluster. By using element-level subdomains, the DFDD-ELD computation can be easily mapped onto GPU´s granular processors and is thus highly parallelizable. Various electromagnetic problems are simulated to demonstrate the speedup and scalability of DFDD-ELD on a GPU cluster. With a careful GPU memory arrangement and thread allocation, we are able to achieve a significant speedup by utilizing GPUs in a message-passing interface (MPI)-based cluster environment. The same acceleration strategy can be applied to the acceleration of the discontinuous Galerkin time-domain (DGTD) algorithms.

Keywords

Galerkin method; application program interfaces; graphics processing units; message passing; parallel architectures; DFDD method; DFDD-ELD computation; DGTD algorithms; GPU cluster; GPU granular processors; GPU memory arrangement; MPI; MPI-CUDA; acceleration strategy; discontinuous Galerkin time-domain algorithms; dual field domain decomposition algorithm; electromagnetic problems; element level decomposition; element level subdomains; finite element method; graphics processing units; large scale computing systems; message-passing interface; thread allocation; Acceleration; Algorithm design and analysis; Computer architecture; Finite element analysis; Graphics processing units; Instruction sets; Vectors; Circuit analysis; GPU cluster; compute unified device architecture (CUDA); finite-element analysis; graphics processing unit (GPU); high-performance computing; message-passing interface (MPI); multi-GPU; parallel programming; radar cross section; time-domain analysis;

fLanguage

English

Journal_Title

Antennas and Propagation, IEEE Transactions on

Publisher

ieee

ISSN

0018-926X

Type

jour

DOI

10.1109/TAP.2014.2330608

Filename

6832499