DocumentCode :
3588668
Title :
Performance analysis of HPC applications with irregular tree data structures
Author :
Khawaja, Ahmed ; Jiajun Wang ; Gerstlauer, Andreas ; John, Lizy K. ; Malhotra, Dhairya ; Biros, George
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Texas at Austin, Austin, TX, USA
fYear :
2014
Firstpage :
418
Lastpage :
425
Abstract :
Adaptive mesh refinement (AMR) numerical methods utilizing octree data structures are an important class of HPC applications, in particular the solution of partial differential equations. Much effort goes into the implementation of efficient versions of these types of programs, where the emphasis is often on increasing multi-node performance when utilizing GPUs and coprocessors. By contrast, our analysis aims to characterize these workloads on traditional CPUs, as we believe that single-threaded intra-node performance of critical kernels is still a key factor for achieving performance at scale. Especially irregular workloads such as AMR methods, however, exhibit severe underutilization on general purpose processors. In this paper, we analyze the single core performance of two state-of-the-art, highly scalable adaptive mesh refinement codes, one based on the Fast Multipole Method (FMM) and one based on the Finite Element Method (FEM), when running on a x86 CPU. We examined both scalar and vectorized implementations to identify performance bottlenecks. We demonstrate that vectorization can provide a significant benefit in achieving high performance. The greatest bottleneck to peak performance is the high fraction of non-floating point instructions in the kernels.
Keywords :
mesh generation; octrees; parallel processing; partial differential equations; AMR numerical method; FEM; FMM; HPC application; adaptive mesh refinement; fast multipole method; finite element method; octree data structure; partial differential equation; Algorithm design and analysis; Bridges; Finite element analysis; Kernel; Octrees; Polynomials; Program processors; AVX; Fast Multipole Method; Finite Element Method; HPC; MANGLL; PAPI; PVFMM; SIMD; adaptive mesh refinement; irregular tree;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Systems (ICPADS), 2014 20th IEEE International Conference on
Type :
conf
DOI :
10.1109/PADSW.2014.7097837
Filename :
7097837
Link To Document :
بازگشت