• DocumentCode
    580074
  • Title

    Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors

  • Author

    Madduri, Kamesh ; Williams, S. ; Ethier, Stephane ; Oliker, Leonid ; Shalf, J. ; Strohmaier, E. ; Yelicky, K.

  • Author_Institution
    CRD/NERSC, Lawrence Berkeley Nat. Lab., Berkeley, CA, USA
  • fYear
    2009
  • fDate
    14-20 Nov. 2009
  • Firstpage
    1
  • Lastpage
    12
  • Abstract
    We present multicore parallelization strategies for the particle-to-grid interpolation step in the Gyrokinetic Toroidal Code (GTC), a 3D particle-in-cell (PIC) application to study turbulent transport in magnetic-confinement fusion devices. Particle-grid interpolation is a known performance bottleneck in several PIC applications. In GTC, this step involves particles depositing charges to a 3D toroidal mesh, and multiple particles may contribute to the charge at a grid point. We design new parallel algorithms for the GTC charge deposition kernel, and analyze their performance on three leading multicore platforms. We implement thirteen different variants for this kernel and identify the best-performing ones given typical PIC parameters such as the grid size, number of particles per cell, and the GTC-specific particle Larmor radius variation. We find that our best strategies can be 2x faster than the reference optimized MPI implementation, and our analysis provides insight into desirable architectural features for high-performance PIC simulation codes.
  • Keywords
    circuit optimisation; interpolation; memory architecture; mesh generation; multiprocessing systems; parallel algorithms; 3D particle-in-cell; 3D toroidal mesh; GTC charge deposition kernel; PIC application; architectural feature; grid size; gyrokinetic particle-to-grid interpolation; gyrokinetic toroidal code; high-performance PIC simulation code; magnetic-confinement fusion device; memory-efficient optimization; multicore parallelization strategies; multicore platform; multicore processor; parallel algorithm; particle Larmor radius variation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing Networking, Storage and Analysis, Proceedings of the Conference on
  • Conference_Location
    Portland, OR
  • Type

    conf

  • DOI
    10.1145/1654059.1654108
  • Filename
    6375522