Title :
A data-centric profiler for parallel programs
Author :
Xu Liu ; Mellor-Crummey, John
Author_Institution :
Dept. of Comput. Sci., Rice Univ., Houston, TX, USA
Abstract :
It is difficult to manually identify opportunities for enhancing data locality. To address this problem, we extended the HPCToolkit performance tools to support data-centric profiling of scalable parallel programs. Our tool uses hardware counters to directly measure memory access latency and attributes latency metrics to both variables and instructions. Different hardware counters provide insight into different aspects of data locality (or lack thereof). Unlike prior tools for data-centric analysis, our tool employs scalable measurement, analysis, and presentation methods that enable it to analyze the memory access behavior of scalable parallel programs with low runtime and space overhead. We demonstrate the utility of HPCToolkit´s new data-centric analysis capabilities with case studies of five well-known benchmarks. In each benchmark, we identify performance bottlenecks caused by poor data locality and demonstrate non-trivial performance optimizations enabled by this guidance.
Keywords :
data analysis; optimisation; parallel programming; HPCToolkit data-centric analysis capabilities; HPCToolkit performance tools; data locality; data-centric profiler; hardware counters; latency metrics; memory access latency; parallel programs; performance optimizations; scalable parallel programs; Context; Instruction sets; Monitoring; Phasor measurement units; Resource management; Data-centric profiling; data locality; scalable profiler;
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SC), 2013 International Conference for
Conference_Location :
Denver, CO
Print_ISBN :
978-1-4503-2378-9
DOI :
10.1145/2503210.2503297