Title :
Low Overhead Soft Error Mitigation Techniques for High-Performance and Aggressive Designs
Author :
Avirneni, Naga Durga Prasad ; Somani, Arun K.
Author_Institution :
Dept. of Electr. & Comput. Eng., Iowa State Univ., Ames, IA, USA
fDate :
4/1/2012 12:00:00 AM
Abstract :
The threat of soft error induced system failure in computing systems has become more prominent, as we adopt ultradeep submicron process technologies. In this paper, we propose two efficient soft error mitigation schemes, namely, Soft Error Mitigation (SEM) and Soft and Timing Error Mitigation (STEM), using the approach of multiple clocking of data for protecting combinational logic blocks from soft errors. Our first technique, SEM, based on distributed and temporal voting of three registers, unloads the soft error detection overhead from the critical path of the systems. SEM is also capable of ignoring false errors and recovers from soft errors using in-situ fast recovery avoiding recomputation. Our second technique, STEM, while tolerating soft errors, adds timing error detection capability to guarantee reliable execution in aggressively clocked designs that enhance system performance by operating beyond worst-case clock frequency. We also present a specialized low overhead clock phase management scheme that ably supports our proposed techniques. Timing-annotated gate-level simulations, using 45 nm libraries, of a pipelined adder-multiplier and DLX processor show that both our techniques achieve near 100 percent fault coverage. For DLX processor, even under severe fault injection campaigns, SEM achieves an average performance improvement of 26.58 percent over a conventional triple modular redundancy voter-based soft error mitigation scheme, while STEM outperforms SEM by 27.42 percent.
Keywords :
adders; circuit simulation; clocks; combinational circuits; error detection; fault tolerant computing; integrated circuit reliability; multiplying circuits; performance evaluation; pipeline processing; redundancy; timing circuits; DLX processor; aggressive designs; combinational logic blocks; distributed voting; fault injection campaigns; high-performance designs; low overhead clock phase management scheme; low overhead soft error mitigation techniques; modular redundancy voter-based soft error mitigation scheme; multiple data clocking approach; pipelined adder-multiplier processor; soft error detection; soft error induced system failure; temporal voting; timing error detection capability; timing error mitigation; timing-annotated gate-level simulations; ultradeep submicron process technologies; worst-case clock frequency; Clocks; Delay; Redundancy; Registers; Stem cells; Soft errors; adaptive systems; error detection; overclocking; performance.; reliability;
Journal_Title :
Computers, IEEE Transactions on