Title :
BarrierPoint: Sampled simulation of multi-threaded applications
Author :
Carlson, Trevor E. ; Heirman, W. ; Van Craeynest, Kenzo ; Eeckhout, Lieven
Author_Institution :
Ghent Univ., Ghent, Belgium
Abstract :
Sampling is a well-known technique to speed up architectural simulation of long-running workloads while maintaining accurate performance predictions. A number of sampling techniques have recently been developed that extend well-known single-threaded techniques to allow sampled simulation of multi-threaded applications. Unfortunately, prior work is limited to non-synchronizing applications (e.g., server throughput workloads); requires the functional simulation of the entire application using a detailed cache hierarchy which limits the overall simulation speedup potential; leads to different units of work across different processor architectures which complicates performance analysis; or, requires massive machine resources to achieve reasonable simulation speedups. In this work, we propose BarrierPoint, a sampling methodology to accelerate simulation by leveraging globally synchronizing barriers in multi-threaded applications. BarrierPoint collects microarchitecture-independent code and data signatures to determine the most representative inter-barrier regions, called barrierpoints. BarrierPoint estimates total application execution time (and other performance metrics of interest) through detailed simulation of these barrierpoints only, leading to substantial simulation speedups. Barrierpoints can be simulated in parallel, use fewer simulation resources, and define fixed units of work to be used in performance comparisons across processor architectures. Our evaluation of BarrierPoint using NPB and Parsec benchmarks reports average simulation speedups of 24.7× (and up to 866.6×) with an average simulation error of 0.9% and 2.9% at most. On average, BarrierPoint reduces the number of simulation machine resources needed by 78×.
Keywords :
multi-threading; parallel architectures; BarrierPoint; NPB; Parsec benchmark; accelerate simulation; architectural simulation; average simulation error; cache hierarchy; data signature; functional simulation; globally synchronizing barrier; long-running workload; microarchitecture-independent code; multithreaded application; nonsynchronizing application; overall simulation speedup potential; performance analysis; performance comparison; performance metrics of interest; performance prediction; processor architecture; reasonable simulation speedup; representative inter-barrier region; sampled simulation; sampling methodology; sampling techniques; server throughput workload; simulation machine resource; simulation resource; substantial simulation speedup; well-known single-threaded techniques; Benchmark testing; Instruction sets; Load modeling; Microarchitecture; Synchronization; Vectors;
Conference_Titel :
Performance Analysis of Systems and Software (ISPASS), 2014 IEEE International Symposium on
Conference_Location :
Monterey, CA
Print_ISBN :
978-1-4799-3604-5
DOI :
10.1109/ISPASS.2014.6844456