• DocumentCode
    166142
  • Title

    BarrierPoint: Sampled simulation of multi-threaded applications

  • Author

    Carlson, Trevor E. ; Heirman, W. ; Van Craeynest, Kenzo ; Eeckhout, Lieven

  • Author_Institution
    Ghent Univ., Ghent, Belgium
  • fYear
    2014
  • fDate
    23-25 March 2014
  • Firstpage
    2
  • Lastpage
    12
  • Abstract
    Sampling is a well-known technique to speed up architectural simulation of long-running workloads while maintaining accurate performance predictions. A number of sampling techniques have recently been developed that extend well-known single-threaded techniques to allow sampled simulation of multi-threaded applications. Unfortunately, prior work is limited to non-synchronizing applications (e.g., server throughput workloads); requires the functional simulation of the entire application using a detailed cache hierarchy which limits the overall simulation speedup potential; leads to different units of work across different processor architectures which complicates performance analysis; or, requires massive machine resources to achieve reasonable simulation speedups. In this work, we propose BarrierPoint, a sampling methodology to accelerate simulation by leveraging globally synchronizing barriers in multi-threaded applications. BarrierPoint collects microarchitecture-independent code and data signatures to determine the most representative inter-barrier regions, called barrierpoints. BarrierPoint estimates total application execution time (and other performance metrics of interest) through detailed simulation of these barrierpoints only, leading to substantial simulation speedups. Barrierpoints can be simulated in parallel, use fewer simulation resources, and define fixed units of work to be used in performance comparisons across processor architectures. Our evaluation of BarrierPoint using NPB and Parsec benchmarks reports average simulation speedups of 24.7× (and up to 866.6×) with an average simulation error of 0.9% and 2.9% at most. On average, BarrierPoint reduces the number of simulation machine resources needed by 78×.
  • Keywords
    multi-threading; parallel architectures; BarrierPoint; NPB; Parsec benchmark; accelerate simulation; architectural simulation; average simulation error; cache hierarchy; data signature; functional simulation; globally synchronizing barrier; long-running workload; microarchitecture-independent code; multithreaded application; nonsynchronizing application; overall simulation speedup potential; performance analysis; performance comparison; performance metrics of interest; performance prediction; processor architecture; reasonable simulation speedup; representative inter-barrier region; sampled simulation; sampling methodology; sampling techniques; server throughput workload; simulation machine resource; simulation resource; substantial simulation speedup; well-known single-threaded techniques; Benchmark testing; Instruction sets; Load modeling; Microarchitecture; Synchronization; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Performance Analysis of Systems and Software (ISPASS), 2014 IEEE International Symposium on
  • Conference_Location
    Monterey, CA
  • Print_ISBN
    978-1-4799-3604-5
  • Type

    conf

  • DOI
    10.1109/ISPASS.2014.6844456
  • Filename
    6844456