DocumentCode :
652364
Title :
An Automatic Parallel-Stage Decoupled Software Pipelining Parallelization Algorithm Based on OpenMP
Author :
Xiaoxian Liu ; Rongcai Zhao ; Lin Han ; Peng Liu
Author_Institution :
State Key Lab. of Math. Eng., Adv. Comput., Zhengzhou, China
fYear :
2013
fDate :
16-18 July 2013
Firstpage :
1825
Lastpage :
1831
Abstract :
While multicore processors increase throughput for multi-programmed and multithreaded codes, many important applications are single threaded and thus do not benefit. Automatic parallelization techniques play an important role in migrating singe threaded applications to multicore platforms. Unfortunately, the prevalence of control flow, recursive data structures, and general pointer accesses in ordinary programs renders the traditional automatic parallelization techniques unsuitable. Parallel-Stage Decoupled Software Pipelining (PS-DSWP) is proposed to exploit fine-grained pipeline parallelism lurking in ordinary programs with the existence of all kinds of dependences, including arbitrary control dependences, at the instruction level. But it requires knowledge of architectural properties and hardware support of a communication channel and two special instructions. We propose an improved PS-DSWP algorithm based on OpenMP in this paper. It is implemented without relying on CPU architectures by using a high level intermediate representation. Moreover, the Program Dependence Graph (PDG) used in the algorithm is built based on the basic blocks, which exploits coarser-grained parallelism than the original PS-DSWP transformation with PDG based on instructions. OpenMP is employed in our algorithm to assign task and implement synchronization among threads while avoiding dependence on hardware support. We evaluate the loops with complex memory patterns and control flow, which cannot be dealt with by traditional techniques, on multicore platform. As a result, they can be parallelized and gain significant performance improvement with our algorithm. We obtain a maximum speedup as high as 2.07x and on average 1.39x with 5 threads.
Keywords :
application program interfaces; data structures; multi-threading; multiprocessing programs; multiprocessing systems; pipeline processing; synchronisation; CPU architectures; OpenMP; PDG; PS-DSWP algorithm; PS-DSWP transformation; architectural properties; automatic parallel-stage decoupled software pipelining parallelization algorithm; automatic parallelization techniques; coarser-grained parallelism; communication channel; complex memory patterns; fine-grained pipeline parallelism lurking; high level intermediate representation; multicore platforms; multicore processors; multiprogrammed codes; multithreaded codes; program dependence graph; recursive data structures; singe threaded applications; Instruction sets; Merging; Multicore processing; Pipeline processing; Synchronization; OpenMP; automatic parallelization; parallel-stage decoupled software pipelining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on
Conference_Location :
Melbourne, VIC
Type :
conf
DOI :
10.1109/TrustCom.2013.227
Filename :
6681059
Link To Document :
بازگشت