DocumentCode :
3480566
Title :
Energy-Aware Fault-Tolerant CGRAs Addressing Application with Different Reliability Needs
Author :
Jafri, Syed Mohammad Asad Hassan ; Piestrak, Stanislaw J. ; Paul, Kolin ; Hemani, Ahmed ; Plosila, Juha ; Tenhunen, Hannu
Author_Institution :
Turku Centre for Comput. Sci. (TUCS), Turku, Finland
fYear :
2013
fDate :
4-6 Sept. 2013
Firstpage :
525
Lastpage :
534
Abstract :
In this paper, we propose a polymorphic fault tolerant architecture that can be tailored to efficiently support the reliability needs of multiple applications at run-time. Today, coarse-grained reconfigurable architectures (CGRAs) host multiple applications with potentially different reliability needs. Providing platform-wide worst-case (maximum) protection to all the applications is neither optimal nor desirable. To reduce the fault-tolerance overhead, adaptive fault-tolerance strategies have been proposed. The proposed techniques access the reliability requirements of each application and adjust the fault-tolerance intensity (and hence overhead), accordingly. However, existing flexible reliability schemes only allow to shift between different levels of modular redundancy (duplication, triplication, etc.) and deal with only a single class of faults (e.g. soft errors). To complement these strategies, we propose energy-aware fault-tolerance that, in addition to modular redundancy, can also provide low cost, sub-modular (e.g. residue mod 3) redundancy, to cater both permanent and temporary faults. Our solution relies on an agent based control layer and a configurable fault-tolerance data path. The control layer identifies the application class and configures the data path to provide the needed reliability. Simulation results using a few selected algorithms (FFT, matrix multiplication, and FIR filter) showed that the proposed method provides flexible protection with energy overhead ranging from 3.125% to 107% for different reliability levels. Synthesis results have confirmed that the proposed architecture significantly reduces the area overhead for self-checking (59.1%) and fault tolerant (7.1%) versions, compared to the state of the art adaptive reliability techniques.
Keywords :
fault tolerant computing; power aware computing; reconfigurable architectures; redundancy; FFT algorithm; FIR filter algorithm; adaptive fault-tolerance strategies; agent-based control layer; application class identification; coarse-grained reconfigurable architectures; configurable fault-tolerance data path; energy overhead; energy-aware fault-tolerant CGRA; fault-tolerance intensity; fault-tolerance overhead reduction; fault-tolerant area overhead reduction; low-cost submodular redundancy; matrix multiplication algorithm; maximum platform-wide worst-case protection; modular redundancy levels; permanent faults; polymorphic fault-tolerant architecture; reliability needs; residue mod-3 redundancy; self-checking area overhead reduction; temporary faults; Circuit faults; Computer architecture; Digital signal processing; Fault tolerant systems; Redundancy; Adaptive systems; CGRAs; Energy aware; Fault tolerance; Low power;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Digital System Design (DSD), 2013 Euromicro Conference on
Conference_Location :
Los Alamitos, CA
Type :
conf
DOI :
10.1109/DSD.2013.62
Filename :
6628323
Link To Document :
بازگشت