• DocumentCode
    1393
  • Title

    Taming Hardware Event Samples for Precise and Versatile Feedback Directed Optimizations

  • Author

    Chen, Dehao ; Vachharajani, Neil ; Hundt, Robert ; Li, Xinliang ; Eranian, Stephane ; Chen, Wenguang ; Zheng, Weimin

  • Author_Institution
    Google Inc., Mountain View, CA, USA
  • Volume
    62
  • Issue
    2
  • fYear
    2013
  • fDate
    Feb. 2013
  • Firstpage
    376
  • Lastpage
    389
  • Abstract
    Feedback-directed optimization (FDO) is effective in improving application runtime performance, but has not been widely adopted due to the tedious dual-compilation model, the difficulties in generating representative training data sets, and the high runtime overhead of profile collection. The use of hardware-event sampling overcomes these drawbacks by providing a lightweight approach to collect execution profiles in the production environment, which naturally consumes representative input. Yet, hardware event samples are typically not precise at the instruction or basic-block granularity. These inaccuracies lead to missed performance when compared to instrumentation-based FDO. In this paper, we use Performance Monitoring Unit (PMU)-based sampling to collect the instruction frequency profiles. By collecting profiles using multiple events, and applying heuristics to predict the accuracy, we improve the accuracy of the profile. We also show how emerging techniques can be used to further improve the accuracy of the sample-based profile. Additionally, these emerging techniques are used to collect value profiles, as well as to assist a lightweight interprocedural optimizer. All these profiles are represented in a portable form, thus they can be used across different platforms. We demonstrate that sampling-based FDO can achieve an average of 92 percent of the performance gains obtained using instrumentation-based exact profiles for both SPEC CINT2000 and CINT2006 benchmarks. The overhead of collection is only 0.93 percent on average, while compiler-based instrumentation incurs 2.0-351.5 percent overhead (and 10x overhead on an industrial web search application).
  • Keywords
    optimising compilers; sampling methods; PMU-based sampling; dual-compilation model; feedback directed optimization; hardware-event sampling; instruction frequency profile; instrumentation-based FDO; lightweight interprocedural optimizer; performance monitoring unit; Hardware; Instruments; Monitoring; Optimization; Phasor measurement units; Program processors; Radiation detectors; Sample profile; feedback directed optimization; last branch record; performance counter;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2011.233
  • Filename
    6109233