DocumentCode
3201082
Title
PCERE: Fine-Grained Parallel Benchmark Decomposition for Scalability Prediction
Author
Popov, Mihail ; Akel, Chadi ; Conti, Florent ; Jalby, William ; De Oliveira Castro, Pablo
Author_Institution
Exascale Comput. Res., Univ. de Versailles St. Quentin-en-Yvelines, Versailles, France
fYear
2015
fDate
25-29 May 2015
Firstpage
1151
Lastpage
1160
Abstract
Evaluating the strong scalability of OpenMP applications is a costly and time-consuming process. It traditionally requires executing the whole application multiple times with different number of threads. We propose the Parallel Codelet Extractor and REplayer (PCERE), a tool to reduce the cost of scalability evaluation. PCERE decomposes applications into small pieces called codelets: each codelet maps to an OpenMP parallel region and can be replayed as a standalone program. To accelerate scalability prediction, PCERE replays codelets while varying the number of threads. Prediction speedup comes from two key ideas. First, the number of invocations during replay can be significantly reduced. Invocations that have the same performance are grouped together and a single representative is replayed. Second, sequential parts of the programs do not need to be replayed for each different thread configuration. PCERE codelets can be captured once and replayed accurately on multiple architectures, enabling cross-architecture parallel performance prediction. We evaluate PCERE on a C version of the NAS 3.0 Parallel Benchmarks (NPB). We achieve an average speed-up of 25 × on evaluating OpenMP applications scalability with an average error of 4.9% (median error of 1.7%).
Keywords
benchmark testing; parallel processing; software architecture; software performance evaluation; NAS 3.0 parallel benchmarks; NPB; OpenMP applications; PCERE; cross-architecture parallel performance prediction; fine-grained parallel benchmark decomposition; parallel codelet extractor and replayer; scalability prediction; thread configuration; Accuracy; Benchmark testing; Context; In vivo; Instruction sets; Optimization; Scalability; OpenMP applications; checkpoint restart; cross-architecture performance prediction; parallel code isolation; program replay; scalability prediction;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International
Conference_Location
Hyderabad
ISSN
1530-2075
Type
conf
DOI
10.1109/IPDPS.2015.19
Filename
7161599
Link To Document