Title :
On-line testing and recovery in TMR systems for real-time applications
Author :
Yu, Shu-Yi ; McCluskey, Edward J.
Author_Institution :
Center for Reliable Comput., Stanford Univ., CA, USA
Abstract :
Triple Modular Redundancy (TMR) is known to improve reliability in real-time computing systems for short mission times. However, TMR-based systems are not effective for longer missions. For failures caused by transient faults, we have designed a new recovery scheme for TMR systems so that they can be used for long mission applications. The scheme can effectively recover computing systems from single transient faults without introducing any re-computation delay; hence, it is suitable for real-time applications. A robot controller is used as a case study. Theoretical analysis and implementation results show that, with very small hardware overhead for recovery logic, the proposed scheme can significantly improve system reliability and lengthen its lifetime. A new state restoration scheme for recovery is presented and shown to have lower overhead than a conventional restoration scheme
Keywords :
automatic testing; computer testing; error detection; fault tolerant computing; logic testing; real-time systems; redundancy; robots; system recovery; TMR-based systems; failures; long mission applications; online testing; real-time applications; real-time computing systems; recovery logic; recovery scheme; robot controller; state restoration scheme; system reliability improvement; transient faults; triple modular redundancy; Delay effects; Error correction; Fault tolerant systems; Hardware; Nuclear magnetic resonance; Power system restoration; Real time systems; Redundancy; Robot control; System testing;
Conference_Titel :
Test Conference, 2001. Proceedings. International
Conference_Location :
Baltimore, MD
Print_ISBN :
0-7803-7169-0
DOI :
10.1109/TEST.2001.966639