DocumentCode :
166155
Title :
Applying the roofline model
Author :
Ofenbeck, Georg ; Steinmann, Ruedi ; Caparros, Victoria ; Spampinato, Daniele G. ; Puschel, Markus
Author_Institution :
Dept. of Comput. Sci., ETH Zurich, Zurich, Switzerland
fYear :
2014
fDate :
23-25 March 2014
Firstpage :
76
Lastpage :
85
Abstract :
The recently introduced roofline model plots the performance of executed code against its operational intensity (operations count divided by memory traffic). It also includes two platform-specific performance ceilings: the processor´s peak performance and a ceiling derived from the memory bandwidth, which is relevant for code with low operational intensity. The model thus makes more precise the notions of memory- and compute-bound and, despite its simplicity, can provide an insightful visualization of bottlenecks. As such it can be valuable to guide manual code optimization as well as in education. Unfortunately, to date the model has been used almost exclusively with back-of-the-envelope calculations and not with measured data. In this paper we show how to produce roofline plots with measured data on recent generations of Intel platforms. We show how to accurately measure the necessary quantities for a given program using performance counters, including threaded and vectorized code, and for warm and cold cache scenarios. We explain the measurement approach, its validation, and discuss limitations. Finally, we show, to this extent for the first time, a set of roofline plots with measured data for common numerical functions on a variety of platforms and discuss their possible uses.
Keywords :
cache storage; multi-threading; program compilers; software performance evaluation; source code (software); Intel platforms; back-of-the-envelope calculations; bottleneck visualization; cold cache scenarios; compute-bound; executed code performance; manual code optimization; memory bandwidth; memory traffic; memory-bound; operation count; operational intensity; performance counters; platform-specific performance ceilings; processor peak performance; roofline model; roofline plots; threaded code; vectorized code; warm cache scenarios; Bandwidth; Bridges; Computational modeling; Microarchitecture; Q measurement; Radiation detectors; Time measurement;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Performance Analysis of Systems and Software (ISPASS), 2014 IEEE International Symposium on
Conference_Location :
Monterey, CA
Print_ISBN :
978-1-4799-3604-5
Type :
conf
DOI :
10.1109/ISPASS.2014.6844463
Filename :
6844463
Link To Document :
بازگشت