• DocumentCode
    166155
  • Title

    Applying the roofline model

  • Author

    Ofenbeck, Georg ; Steinmann, Ruedi ; Caparros, Victoria ; Spampinato, Daniele G. ; Puschel, Markus

  • Author_Institution
    Dept. of Comput. Sci., ETH Zurich, Zurich, Switzerland
  • fYear
    2014
  • fDate
    23-25 March 2014
  • Firstpage
    76
  • Lastpage
    85
  • Abstract
    The recently introduced roofline model plots the performance of executed code against its operational intensity (operations count divided by memory traffic). It also includes two platform-specific performance ceilings: the processor´s peak performance and a ceiling derived from the memory bandwidth, which is relevant for code with low operational intensity. The model thus makes more precise the notions of memory- and compute-bound and, despite its simplicity, can provide an insightful visualization of bottlenecks. As such it can be valuable to guide manual code optimization as well as in education. Unfortunately, to date the model has been used almost exclusively with back-of-the-envelope calculations and not with measured data. In this paper we show how to produce roofline plots with measured data on recent generations of Intel platforms. We show how to accurately measure the necessary quantities for a given program using performance counters, including threaded and vectorized code, and for warm and cold cache scenarios. We explain the measurement approach, its validation, and discuss limitations. Finally, we show, to this extent for the first time, a set of roofline plots with measured data for common numerical functions on a variety of platforms and discuss their possible uses.
  • Keywords
    cache storage; multi-threading; program compilers; software performance evaluation; source code (software); Intel platforms; back-of-the-envelope calculations; bottleneck visualization; cold cache scenarios; compute-bound; executed code performance; manual code optimization; memory bandwidth; memory traffic; memory-bound; operation count; operational intensity; performance counters; platform-specific performance ceilings; processor peak performance; roofline model; roofline plots; threaded code; vectorized code; warm cache scenarios; Bandwidth; Bridges; Computational modeling; Microarchitecture; Q measurement; Radiation detectors; Time measurement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Performance Analysis of Systems and Software (ISPASS), 2014 IEEE International Symposium on
  • Conference_Location
    Monterey, CA
  • Print_ISBN
    978-1-4799-3604-5
  • Type

    conf

  • DOI
    10.1109/ISPASS.2014.6844463
  • Filename
    6844463