Title :
TransPar: Transformation based dynamic Parallelism for low power CGRAs
Author :
Jafri, Syed Mohammad Asad Hassan ; Serrano, Guillermo ; Daneshtalab, Masoud ; Abbas, Nadine ; Hemani, Ahmed ; Paul, Kolin ; Plosila, Juha ; Tenhunen, Hannu
Author_Institution :
Turku Centre for Comput. Sci., Turku, Finland
Abstract :
Coarse Grained Reconfigurable Architectures (CGRAs) are emerging as enabling platforms to meet the high performance demanded by modern applications (e.g. 4G, CDMA, etc.). Recently proposed CGRAs offer runtime parallelism to reduce energy consumption (by lowering voltage/frequency). To implement the runtime parallelism, CGRAs commonly store multiple compile-time generated implementations of an application (with different degree of parallelism) and select the optimal version at runtime. However, the compile-time binding incurs excessive configuration memory overheads and/or is unable to parallelize an application even when sufficient resources are available. As a solution to this problem, we propose Transformation based dynamic Parallelism (TransPar). TransPar stores only a single implementation and applies a series for transformations to generate the bitstream for the parallel version. In addition, it also allows to displace and/or rotate an application to parallelize in resource constrained scenarios. By storing only a single implementation, TransPar offers significant reductions in configuration memory requirements (up to 73% for the tested applications), compared to state of the art compaction techniques. Simulation and synthesis results, using real applications, reveal that the additional flexibility allows up to 33% energy reduction compared to static memory based parallelism techniques. Gate level analysis reveals that TransPar incurs negligible silicon (0.2% of the platform) and timing (6 additional cycles per application) penalty.
Keywords :
parallel programming; reconfigurable architectures; TransPar; coarse grained reconfigurable architecture; compile-time binding; configuration memory requirements; energy consumption reduction; frequency reduction; gate level analysis; low power CGRA; negligible silicon penalty; runtime parallelism; static memory based parallelism techniques; timing penalty; transformation based dynamic parallelism; voltage reduction; Delays; MPEG 4 Standard; Memory management; Parallel processing; Runtime; Wireless LAN;
Conference_Titel :
Field Programmable Logic and Applications (FPL), 2014 24th International Conference on
Conference_Location :
Munich
DOI :
10.1109/FPL.2014.6927485