Title :
Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems
Author :
Madduri, Kamesh ; Ibrahim, Khaled Z. ; Williams, Samuel ; Im, Eun-Jin ; Ethier, Stephane ; Shalf, John ; Oliker, Leonid
Author_Institution :
NERSC/CRD, Lawrence Berkeley Nat. Lab., Berkeley, CA, USA
Abstract :
The gyrokinetic Particle-in-Cell (PIC) method is a critical computational tool enabling petascale fusion simulation re- search. In this work, we present novel multi- and manycore-centric optimizations to enhance performance of GTC, a PIC-based production code for studying plasma microturbulence in tokamak devices. Our optimizations encompass all six GTC sub-routines and include multi-level particle and grid decompositions designed to improve multi-node parallel scaling, particle binning for improved load balance, GPU acceleration of key subroutines, and memory-centric optimizations to improve single-node scaling and reduce memory utilization. The new hybrid MPI-OpenMP and MPI-OpenMP-CUDA GTC versions achieve up to a 2× speedup over the production Fortran code on four parallel systems - clusters based on the AMD Magny-Cours, Intel Nehalem-EP, IBM BlueGene/P, and NVIDIA Fermi architectures. Finally, strong scaling experiments provide insight into parallel scalability, memory utilization, and programmability trade-offs for large-scale gyrokinetic PIC simulations, while attaining a 1.6× speedup on 49,152 XE6 cores.
Keywords :
Tokamak devices; application program interfaces; computational fluid dynamics; graphics processing units; message passing; multiprocessing systems; parallel processing; plasma kinetic theory; plasma simulation; plasma toroidal confinement; plasma turbulence; resource allocation; GPU acceleration; GTC subroutine; MPI-OpenMP CUDA GTC version; PIC based production code; computational tool; grid decomposition; gyrokinetic particle in cell method; large scale gyrokinetic PIC simulation; load balancing; manycore centric optimizations; memory centric optimization; memory utilization; multicore centric optimization; multilevel particle; multinode parallel scaling; parallel scalability; particle binning; petascale fusion simulation research; plasma microturbulence; programmability; single node scaling; tokamak devices; Computational modeling; Graphics processing unit; Instruction sets; Kernel; Optimization; Parallel processing; Programming; Particle-in-Cell; hybrid programming; multicore optimization;
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for
Conference_Location :
Seatle, WA
Electronic_ISBN :
978-1-4503-0771-0