DocumentCode :
1761082
Title :
Design and Evaluation of Confidence-Driven Error-Resilient Systems
Author :
Chia-Hsiang Chen ; Blaauw, D. ; Sylvester, Dennis ; Zhengya Zhang
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Univ. of Michigan, Ann Arbor, MI, USA
Volume :
22
Issue :
8
fYear :
2014
fDate :
Aug. 2014
Firstpage :
1727
Lastpage :
1737
Abstract :
Deeply scaled CMOS circuits are increasingly susceptible to transient faults and soft errors; emerging post-CMOS devices can be more vulnerable, sometimes exhibiting erratic errors of arbitrary duration. Applying timing and supply voltage margin is wasteful and becoming ineffective, and conventional checking and sparing techniques provide only a limited error coverage against widely varying errors. We propose a confidence-driven computing (CDC) model for an adaptive protection against nondeterministic errors. The CDC model employs fine-grained temporal redundancy and confidence checking for a faster adaptation and tunable reliability. The CDC model can be extended to deeply scaled CMOS circuits that are mainly affected by transient faults and soft errors, where an early checking (EC) technique can be used to perform independent error checking for more flexibility and better performance. To evaluate the CDC model, we apply a sample-based field-programmable gate array emulation along with real-time error injection. The CDC model is shown to adapt to fluctuating error rates and enhance the system reliability by effectively trading off performance. To evaluate the EC technique at a finer time scale, we create a new event-based simulation to capture path delay distribution, error model, and their interactions. The EC technique improves the system reliability by more than four orders of magnitude when errors are of short duration. Both the CDC model and the EC technique are synthesized in a 45-nm CMOS technology for cost estimates: 1) the area overhead is as low as 12% and 2) energy overhead can be limited to 19%.
Keywords :
CMOS integrated circuits; error detection; field programmable gate arrays; integrated circuit reliability; radiation hardening (electronics); semiconductor device reliability; transients; CMOS circuits; adaptation reliability; adaptive protection; arbitrary duration; checking techniques; confidence-driven computing model; confidence-driven error-resilient systems; delay distribution; erratic errors; error coverage; fine-grained temporal redundancy; fluctuating error rates; nondeterministic errors; real-time error injection; sample-based field-programmable gate array emulation; size 45 nm; soft errors; sparing techniques; supply voltage margin; system reliability; time scale; timing margin; transient faults; tunable reliability; Delays; Emulation; Error analysis; Redundancy; Semiconductor device modeling; Synchronization; Error detection; error simulation; field-programmable gate array (FPGA) emulation; reliability; resilient design; resilient design.;
fLanguage :
English
Journal_Title :
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1063-8210
Type :
jour
DOI :
10.1109/TVLSI.2013.2277351
Filename :
6585814
Link To Document :
بازگشت