DocumentCode :
129142
Title :
Tightly-coupled hardware support to dynamic parallelism acceleration in embedded shared memory clusters
Author :
Burgio, Paolo ; Tagliavini, Giuseppe ; Conti, Francesco ; Marongiu, Andrea ; Benini, Luca
Author_Institution :
DEI, Univ. degli Studi di Bologna, Bologna, Italy
fYear :
2014
fDate :
24-28 March 2014
Firstpage :
1
Lastpage :
6
Abstract :
Modern designs for embedded systems are increasingly embracing cluster-based architectures, where small sets of cores communicate through tightly-coupled shared memory banks and high-performance interconnections. At the same time, the complexity of modern applications requires new programming abstractions to exploit dynamic and/or irregular parallelism on such platforms. Supporting dynamic parallelism in systems which i) are resource-constrained and ii) run applications with small units of work calls for a runtime environment which has minimal overhead for the scheduling of parallel tasks. In this work, we study the major sources of overhead in the implementation of OpenMP dynamic loops, sections and tasks, and propose a hardware implementation of a generic Scheduling Engine (HWSE) which fits the semantics of the three constructs. The HWSE is designed as a tightly-coupled block to the PEs within a multi-core cluster, communicating through a shared-memory interface. This allows very fast programming and synchronization with the controlling PEs, fundamental to achieving fast dynamic scheduling, and ultimately to enable fine-grained parallelism. We prove the effectiveness of our solutions with real applications and synthetic benchmarks, using a cycle-accurate virtual platform.
Keywords :
embedded systems; parallel processing; scheduling; shared memory systems; HWSE; OpenMP dynamic loops; cluster-based architectures; controlling PE; cycle accurate virtual platform; dynamic parallelism acceleration; embedded shared memory clusters; embedded systems; fast dynamic scheduling; fine-grained parallelism; generic scheduling engine; high performance interconnections; irregular parallelism; multicore cluster; parallel tasks; programming abstractions; runtime environment; shared memory interface; synchronization; synthetic benchmarks; tightly coupled hardware support; tightly coupled shared memory banks; Acceleration; Computer architecture; Dynamic scheduling; Hardware; Parallel processing; Programming; Software;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Design, Automation and Test in Europe Conference and Exhibition (DATE), 2014
Conference_Location :
Dresden
Type :
conf
DOI :
10.7873/DATE.2014.169
Filename :
6800370
Link To Document :
بازگشت