DocumentCode :
39326
Title :
Higher Order Method of Moments With a Parallel Out-of-Core LU Solver on GPU/CPU Platform
Author :
Xing Mu ; Hou-Xing Zhou ; Kang Chen ; Wei Hong
Author_Institution :
State Key Lab. of Millimeter Waves, Southeast Univ., Nanjing, China
Volume :
62
Issue :
11
fYear :
2014
fDate :
Nov. 2014
Firstpage :
5634
Lastpage :
5646
Abstract :
In this paper, a full realization of the higher order method of moments (HMoM) with a parallel out-of-core LU solver on GPU/CPU platform is presented in detail, mainly including three parts: In the first part, both global-auxiliary table and local-auxiliary table are introduced for reducing a lot of tedious and repetitive calculations, and then a realization for GPU-oriented programming is proposed and optimized. In the second part, an overlapped grouping of all the curved quadrilaterals is proposed. With this scheme, all the submatrices can be efficiently generated one by one without wasting any calculations with the help of both the video memory and the host memory. In the third part, a GPU-based out-of-core algorithm for LU decomposition is proposed and further developed into a hybrid GPU/CPU algorithm. Numerical examples are provided to test the robustness of the proposed algorithm by comparison with the measurement and/or the traditional MoM with RWG basis functions, and to demonstrate the overall performance of the proposed algorithm by comparison with the existing algorithm for dealing with similar problems. The speedup ratio of the proposed algorithm for generating the HMoM matrix can achieve about from 7 to 12 compared with the GPU-based algorithm in literatures. Also compared with the 8-threaded CPU-based algorithm, the speedup ratio of the proposed algorithm for LU decomposition can exceed 13 for the single precision case and 7 for the double precision case.
Keywords :
graphics processing units; matrix algebra; method of moments; parallel algorithms; 8-threaded CPU-based algorithm; GPU-based algorithm; GPU-based out-of-core algorithm; GPU-oriented programming; GPU/CPU platform; HMoM matrix; LU decomposition; RWG basis functions; curved quadrilaterals; global auxiliary table; higher order method of moments; host memory; hybrid GPU/CPU algorithm; local auxiliary table; overlapped grouping; parallel out-of-core LU solver; repetitive calculations; video memory; Graphics processing units; Hard disks; Instruction sets; Method of moments; Programming; Random access memory; System-on-chip; CUDA; GPU; OpenMP; high-order basis function; method of moments (MoM); out-of-core LU solver; parallel algorithm; speedup ratio;
fLanguage :
English
Journal_Title :
Antennas and Propagation, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-926X
Type :
jour
DOI :
10.1109/TAP.2014.2350536
Filename :
6881670
Link To Document :
بازگشت