DocumentCode :
167473
Title :
OpenMP Task Scheduling Analysis via OpenMP Runtime API and Tool Visualization
Author :
Qawasmeh, Ahmad ; Malik, Abid M. ; Chapman, Barbara M.
Author_Institution :
Dept. of Comput. Sci., Univ. of Houston, Houston, TX, USA
fYear :
2014
fDate :
19-23 May 2014
Firstpage :
1049
Lastpage :
1058
Abstract :
OpenMP tasks propose a new dimension of concurrency to cap irregular parallelism within applications. The addition of OpenMP tasks allows programmers to express concurrency at a high level of abstraction and makes the OpenMP runtime responsible about the burden of scheduling parallel execution. The ability to observe the performance of OpenMP task scheduling strategies portably across shared memory platforms has been a challenge due to the lack of performance interface standards in the runtime layer. In this paper, we exploit our proposed tasking extensions to the OpenMP Runtime API (ORA), Known as Collector APIs, for profiling task level parallelism. We describe the integration of these Collector APIs, implemented in the OpenUH compiler, into the TAU performance system. Our proposed task extensions are in line with the new interface specification called OMPT, which is currently under evaluation by the OpenMP community. We use this integration to analyze various OpenMP task scheduling strategies implemented in OpenUH. The capabilities of these scheduling strategies are evaluated with respect to exploiting data locality, maintaining load balance, and minimizing overhead costs. We present a comprehensive performance study of diverse OpenMP benchmarks, from the Barcelona OpenMP Test Suite, comparing different task pools (DEFAULT, SIMPLE, SIMPLE_2LEVEL, PUBLIC PRIVATE), task queues (DEQUE, FIFO, CFIFO, LIFO, INV_DEQUE), and task queue storages (ARRAY, DYN_ARRAY, LIST, LOCKLESS) on an AMD Opteron multicore system (48 cores total). Our results show that the benchmarks with similar characteristics exhibit the same behavior in terms of the performance of the applied scheduling strategies. Moreover, the task pool configuration, which controls the organization of task queues, was found to have the highest impact on performance.
Keywords :
application program interfaces; data visualisation; program compilers; scheduling; task analysis; AMD Opteron multicore system; Barcelona OpenMP Test Suite; OMPT; ORA; OpenMP benchmarks; OpenMP community; OpenMP runtime API; OpenMP task scheduling analysis; OpenUH compiler; TAU performance system; collector API; data locality; interface specification; irregular parallelism; parallel execution; performance interface standards; runtime layer; shared memory platforms; task level parallelism; task pool configuration; task pools; tool visualization; Benchmark testing; Educational institutions; Instruction sets; Parallel processing; Runtime; Scheduling; Standards; Collector APIs; OpenMP; OpenMP tools; Task scheduling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
Conference_Location :
Phoenix, AZ
Print_ISBN :
978-1-4799-4117-9
Type :
conf
DOI :
10.1109/IPDPSW.2014.116
Filename :
6969496
Link To Document :
بازگشت