DocumentCode
166155
Title
Applying the roofline model
Author
Ofenbeck, Georg ; Steinmann, Ruedi ; Caparros, Victoria ; Spampinato, Daniele G. ; Puschel, Markus
Author_Institution
Dept. of Comput. Sci., ETH Zurich, Zurich, Switzerland
fYear
2014
fDate
23-25 March 2014
Firstpage
76
Lastpage
85
Abstract
The recently introduced roofline model plots the performance of executed code against its operational intensity (operations count divided by memory traffic). It also includes two platform-specific performance ceilings: the processor´s peak performance and a ceiling derived from the memory bandwidth, which is relevant for code with low operational intensity. The model thus makes more precise the notions of memory- and compute-bound and, despite its simplicity, can provide an insightful visualization of bottlenecks. As such it can be valuable to guide manual code optimization as well as in education. Unfortunately, to date the model has been used almost exclusively with back-of-the-envelope calculations and not with measured data. In this paper we show how to produce roofline plots with measured data on recent generations of Intel platforms. We show how to accurately measure the necessary quantities for a given program using performance counters, including threaded and vectorized code, and for warm and cold cache scenarios. We explain the measurement approach, its validation, and discuss limitations. Finally, we show, to this extent for the first time, a set of roofline plots with measured data for common numerical functions on a variety of platforms and discuss their possible uses.
Keywords
cache storage; multi-threading; program compilers; software performance evaluation; source code (software); Intel platforms; back-of-the-envelope calculations; bottleneck visualization; cold cache scenarios; compute-bound; executed code performance; manual code optimization; memory bandwidth; memory traffic; memory-bound; operation count; operational intensity; performance counters; platform-specific performance ceilings; processor peak performance; roofline model; roofline plots; threaded code; vectorized code; warm cache scenarios; Bandwidth; Bridges; Computational modeling; Microarchitecture; Q measurement; Radiation detectors; Time measurement;
fLanguage
English
Publisher
ieee
Conference_Titel
Performance Analysis of Systems and Software (ISPASS), 2014 IEEE International Symposium on
Conference_Location
Monterey, CA
Print_ISBN
978-1-4799-3604-5
Type
conf
DOI
10.1109/ISPASS.2014.6844463
Filename
6844463
Link To Document