مرکز منطقه ای اطلاع رساني علوم و فناوري - Performance analysis of HPC applications with irregular tree data structures

DocumentCode :

3588668

Title :

Performance analysis of HPC applications with irregular tree data structures

Author :

Khawaja, Ahmed ; Jiajun Wang ; Gerstlauer, Andreas ; John, Lizy K. ; Malhotra, Dhairya ; Biros, George

Author_Institution :

Dept. of Electr. & Comput. Eng., Univ. of Texas at Austin, Austin, TX, USA

fYear :

2014

Firstpage :

418

Lastpage :

425

Abstract :

Adaptive mesh refinement (AMR) numerical methods utilizing octree data structures are an important class of HPC applications, in particular the solution of partial differential equations. Much effort goes into the implementation of efficient versions of these types of programs, where the emphasis is often on increasing multi-node performance when utilizing GPUs and coprocessors. By contrast, our analysis aims to characterize these workloads on traditional CPUs, as we believe that single-threaded intra-node performance of critical kernels is still a key factor for achieving performance at scale. Especially irregular workloads such as AMR methods, however, exhibit severe underutilization on general purpose processors. In this paper, we analyze the single core performance of two state-of-the-art, highly scalable adaptive mesh refinement codes, one based on the Fast Multipole Method (FMM) and one based on the Finite Element Method (FEM), when running on a x86 CPU. We examined both scalar and vectorized implementations to identify performance bottlenecks. We demonstrate that vectorization can provide a significant benefit in achieving high performance. The greatest bottleneck to peak performance is the high fraction of non-floating point instructions in the kernels.

Keywords :

mesh generation; octrees; parallel processing; partial differential equations; AMR numerical method; FEM; FMM; HPC application; adaptive mesh refinement; fast multipole method; finite element method; octree data structure; partial differential equation; Algorithm design and analysis; Bridges; Finite element analysis; Kernel; Octrees; Polynomials; Program processors; AVX; Fast Multipole Method; Finite Element Method; HPC; MANGLL; PAPI; PVFMM; SIMD; adaptive mesh refinement; irregular tree;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel and Distributed Systems (ICPADS), 2014 20th IEEE International Conference on

Type :

conf

DOI :

10.1109/PADSW.2014.7097837

Filename :

7097837

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3588668