Title :
Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems
Author :
Mudigere, Dheevatsa ; Sridharan, Srinivas ; Deshpande, Anand ; Jongsoo Park ; Heinecke, Alexander ; Smelyanskiy, Mikhail ; Kaul, Bharat ; Dubey, Pradeep ; Kaushik, Dinesh ; Keyes, David
Author_Institution :
Parallel Comput. Lab., Intel Corp., Bangalore, India
Abstract :
In this work, we revisit the 1999 Gordon Bell Prize winning PETSc-FUN3D aerodynamics code, extending it with highly-tuned shared-memory parallelization and detailed performance analysis on modern highly parallel architectures. An unstructured-grid implicit flow solver, which forms the backbone of computational aerodynamics, poses particular challenges due to its large irregular working sets, unstructured memory accesses, and variable/limited amount of parallelism. This code, based on a domain decomposition approach, exposes tradeoffs between the number of threads assigned to each MPI-rank sub domain, and the total number of domains. By applying several algorithm- and architecture-aware optimization techniques for unstructured grids, we show a 6.9X speed-up in performance on a single-node Intel® XeonTM1 E5 2690 v2 processor relative to the out-of-the-box compilation. Our scaling studies on TACC Stampede supercomputer show that our optimizations continue to provide performance benefits over baseline implementation as we scale up to 256 nodes.
Keywords :
aerodynamics; application program interfaces; computational fluid dynamics; mesh generation; message passing; parallel architectures; shared memory systems; MPI-rank subdomain; PETSc-FUN3D aerodynamics code; TACC Stampede supercomputer; algorithm-aware optimization techniques; architecture-aware optimization techniques; computational aerodynamics; domain decomposition approach; highly-tuned shared-memory parallelization; out-of-the-box compilation; parallel architectures; parallel systems; shared-memory optimizations; single-node Intel XeonTM1 E5 2690 v2 processor; unstructured memory accesses; unstructured mesh CFD application; unstructured-grid implicit flow solver; Computational fluid dynamics; Instruction sets; Jacobian matrices; Kernel; Optimization; Parallel processing; Sparse matrices; CFD; Krylov Solver; Multi-core; OpenMP+MPI;
Conference_Titel :
Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International
Conference_Location :
Hyderabad
DOI :
10.1109/IPDPS.2015.114