مرکز منطقه ای اطلاع رساني علوم و فناوري - An Automatic Parallel-Stage Decoupled Software Pipelining Parallelization Algorithm Based on OpenMP

DocumentCode :

652364

Title :

An Automatic Parallel-Stage Decoupled Software Pipelining Parallelization Algorithm Based on OpenMP

Author :

Xiaoxian Liu ; Rongcai Zhao ; Lin Han ; Peng Liu

Author_Institution :

State Key Lab. of Math. Eng., Adv. Comput., Zhengzhou, China

fYear :

2013

fDate :

16-18 July 2013

Firstpage :

1825

Lastpage :

1831

Abstract :

While multicore processors increase throughput for multi-programmed and multithreaded codes, many important applications are single threaded and thus do not benefit. Automatic parallelization techniques play an important role in migrating singe threaded applications to multicore platforms. Unfortunately, the prevalence of control flow, recursive data structures, and general pointer accesses in ordinary programs renders the traditional automatic parallelization techniques unsuitable. Parallel-Stage Decoupled Software Pipelining (PS-DSWP) is proposed to exploit fine-grained pipeline parallelism lurking in ordinary programs with the existence of all kinds of dependences, including arbitrary control dependences, at the instruction level. But it requires knowledge of architectural properties and hardware support of a communication channel and two special instructions. We propose an improved PS-DSWP algorithm based on OpenMP in this paper. It is implemented without relying on CPU architectures by using a high level intermediate representation. Moreover, the Program Dependence Graph (PDG) used in the algorithm is built based on the basic blocks, which exploits coarser-grained parallelism than the original PS-DSWP transformation with PDG based on instructions. OpenMP is employed in our algorithm to assign task and implement synchronization among threads while avoiding dependence on hardware support. We evaluate the loops with complex memory patterns and control flow, which cannot be dealt with by traditional techniques, on multicore platform. As a result, they can be parallelized and gain significant performance improvement with our algorithm. We obtain a maximum speedup as high as 2.07x and on average 1.39x with 5 threads.

Keywords :

application program interfaces; data structures; multi-threading; multiprocessing programs; multiprocessing systems; pipeline processing; synchronisation; CPU architectures; OpenMP; PDG; PS-DSWP algorithm; PS-DSWP transformation; architectural properties; automatic parallel-stage decoupled software pipelining parallelization algorithm; automatic parallelization techniques; coarser-grained parallelism; communication channel; complex memory patterns; fine-grained pipeline parallelism lurking; high level intermediate representation; multicore platforms; multicore processors; multiprogrammed codes; multithreaded codes; program dependence graph; recursive data structures; singe threaded applications; Instruction sets; Merging; Multicore processing; Pipeline processing; Synchronization; OpenMP; automatic parallelization; parallel-stage decoupled software pipelining;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on

Conference_Location :

Melbourne, VIC

Type :

conf

DOI :

10.1109/TrustCom.2013.227

Filename :

6681059

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=652364