Title :
Predicting Potential Speedup of Serial Code via Lightweight Profiling and Emulations with Memory Performance Model
Author :
Kim, Minjang ; Kumar, Pranith ; Kim, Hyesoon ; Brett, Bevin
Author_Institution :
Sch. of Comput. Sci., Georgia Inst. of Technol., Atlanta, GA, USA
Abstract :
We achieve very small runtime overhead: approximately a 1.2-10 times slowdown and moderate memory consumption. We demonstrate the effectiveness of Parallel Prophet in eight benchmarks in the Omp SCR and NAS Parallel benchmarks by comparing our predictions with actual parallelized code. Our simple memory model also identifies performance limitations resulting from the memory system contention. We present Parallel Prophet, which projects potential parallel speedup from an annotated serial program before actual parallelization. Programmers want to see how much speedup could be obtained prior to investing time and effort to write parallel code. With Parallel Prophet, programmers simply insert annotations that describe the parallel behavior of the serial program. Parallel Prophet then uses lightweight interval profiling and dynamic emulations to predict potential performance benefit. Parallel Prophet models many realistic features of parallel programs: unbalanced workload, multiple critical sections, nested and recursive parallelism, and specific thread schedulings and paradigms, which are hard to model in previous approaches. Furthermore, Parallel Prophet predicts speedup saturation resulting from memory and caches by onitoring cache hit ratio and bandwidth consumption in a serial program. We achieve very small runtime overhead: approximately a 1.2-10 times slowdown and moderate memory consumption. We demonstrate the effectiveness of Parallel Prophet in eight benchmarks in the OmpSCR and NAS Parallel benchmarks by comparing our predictions with actual parallelized code. Our simple memory model also identifies performance limitations resulting from memory system contention.
Keywords :
cache storage; parallel programming; NAS parallel benchmarks; OmpSCR; annotated serial program; bandwidth consumption; cache hit ratio; dynamic emulations; interval profiling; lightweight profiling; memory consumption; memory performance model; memory system contention; nested parallelism; parallel prophet; parallel serial program behavior; potential serial code speedup prediction; recursive parallelism; simple memory model; speedup saturation; Analytical models; Emulation; Instruction sets; Parallel processing; Prediction algorithms; Predictive models; Radiation detectors; Parallelization; Performance; Profiling;
Conference_Titel :
Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International
Conference_Location :
Shanghai
Print_ISBN :
978-1-4673-0975-2
DOI :
10.1109/IPDPS.2012.128