DocumentCode
2000149
Title
Acceleration of a High Order Finite-Difference WENO Scheme for Large-Scale Cosmological Simulations on GPU
Author
Chen Meng ; Long Wang ; Zongyan Cao ; Xianfeng Ye ; Long-Long Feng
Author_Institution
Supercomput. Center, Comput. Network Inf. Center, Beijing, China
fYear
2013
fDate
20-24 May 2013
Firstpage
2071
Lastpage
2078
Abstract
In this work, we present our implementation of a three-dimensional 5th order finite-difference weighted essentially non-oscillatory (WENO) scheme in double precision on CPU/GPU clusters, which targets on large-scale cosmological hydrodynamic flow simulations involving both shocks and complicated smooth solution structures. In the level of MPI parallelization, we subdivided the domain along each of three axial directions. Then on each process, we ported the WENO computation to GPU. This method is memory-bound derived from the calculations of the weights and it becomes a greater challenge for a 3D high order problem in double precision. To make full use of impressive computing power of GPU and avoid its memory limitation, we performed a series of optimizations that are focused on memory accessing mode at all levels. We subjected this code to a number of typical tests for the evaluation of effectiveness and efficiency. Our tests indicate that, in a mono-thread Fortran code reference, the GPU version achieves a 12~19 speed-up and about 19~36 in the computation part. We analyzed the results on both Fermi and Kepler GPUs. We also outlined what is needed to further increase the speed by reducing the time spent on the communications part and other future work.
Keywords
astronomy computing; cosmology; finite difference methods; flow simulation; graphics processing units; hydrodynamics; message passing; CPU-GPU clusters; Fermi GPU; Kepler GPU; MPI parallelization; axial directions; graphics processing unit; high order finite-difference WENO scheme; large-scale cosmological hydrodynamic flow simulations; memory limitation; message passing interface; monothread Fortran code reference; solution structures; weighted essentially nonoscillatory scheme; Electric shock; Equations; Graphics processing units; Instruction sets; Kernel; Mathematical model; Three-dimensional displays; 3D; GPU; WENO; cosmological hydrodynamic; double precision;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International
Conference_Location
Cambridge, MA
Print_ISBN
978-0-7695-4979-8
Type
conf
DOI
10.1109/IPDPSW.2013.169
Filename
6651112
Link To Document