• DocumentCode
    2000149
  • Title

    Acceleration of a High Order Finite-Difference WENO Scheme for Large-Scale Cosmological Simulations on GPU

  • Author

    Chen Meng ; Long Wang ; Zongyan Cao ; Xianfeng Ye ; Long-Long Feng

  • Author_Institution
    Supercomput. Center, Comput. Network Inf. Center, Beijing, China
  • fYear
    2013
  • fDate
    20-24 May 2013
  • Firstpage
    2071
  • Lastpage
    2078
  • Abstract
    In this work, we present our implementation of a three-dimensional 5th order finite-difference weighted essentially non-oscillatory (WENO) scheme in double precision on CPU/GPU clusters, which targets on large-scale cosmological hydrodynamic flow simulations involving both shocks and complicated smooth solution structures. In the level of MPI parallelization, we subdivided the domain along each of three axial directions. Then on each process, we ported the WENO computation to GPU. This method is memory-bound derived from the calculations of the weights and it becomes a greater challenge for a 3D high order problem in double precision. To make full use of impressive computing power of GPU and avoid its memory limitation, we performed a series of optimizations that are focused on memory accessing mode at all levels. We subjected this code to a number of typical tests for the evaluation of effectiveness and efficiency. Our tests indicate that, in a mono-thread Fortran code reference, the GPU version achieves a 12~19 speed-up and about 19~36 in the computation part. We analyzed the results on both Fermi and Kepler GPUs. We also outlined what is needed to further increase the speed by reducing the time spent on the communications part and other future work.
  • Keywords
    astronomy computing; cosmology; finite difference methods; flow simulation; graphics processing units; hydrodynamics; message passing; CPU-GPU clusters; Fermi GPU; Kepler GPU; MPI parallelization; axial directions; graphics processing unit; high order finite-difference WENO scheme; large-scale cosmological hydrodynamic flow simulations; memory limitation; message passing interface; monothread Fortran code reference; solution structures; weighted essentially nonoscillatory scheme; Electric shock; Equations; Graphics processing units; Instruction sets; Kernel; Mathematical model; Three-dimensional displays; 3D; GPU; WENO; cosmological hydrodynamic; double precision;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International
  • Conference_Location
    Cambridge, MA
  • Print_ISBN
    978-0-7695-4979-8
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2013.169
  • Filename
    6651112