Title :
Recovery Device for Real-Time Dual-Redundant Computer Systems
Author_Institution :
Dept. of Comput. Eng., Ankara Univ., Ankara, Turkey
Abstract :
This paper proposes the design of specialized hardware, called Recovery Device, for a dual-redundant computer system that operates in real-time. Recovery Device executes all fault-tolerant services including fault detection, fault type determination, fault localization, recovery of system after temporary (transient) fault, and reconfiguration of system after permanent fault. The paper also proposes the algorithms for determination of fault type (whether the fault is temporary or permanent) and localization of faulty computer without using self-testing techniques and diagnosis routines. Determination of fault type allows us to eliminate only the computer with a permanent fault. In other words, the determination of fault type prevents the elimination of nonfaulty computer because of short temporary fault. On the other hand, localization of faulty computer without using self-testing techniques and diagnosis routines shortens the recovery point time period and reduces the probability that a fault will occur during the execution of fault-tolerant procedure. This is very important for real-time fault-tolerant systems. These contributions bring both an increase in system performance and an increase in the degree of system reliability.
Keywords :
probability; program testing; real-time systems; redundancy; software fault tolerance; fault detection; fault diagnosis; fault localization; fault-tolerant services; probability; real-time dual-redundant computer systems; recovery device; self-testing techniques; short temporary fault; system performance; system reconfiguration; system reliability; Real time systems; Dual-redundant computer system; fault-tolerant procedure; hardware implementation; real-time; recovery device; recovery point; temporary and permanent faults.;
Journal_Title :
Dependable and Secure Computing, IEEE Transactions on
DOI :
10.1109/TDSC.2010.12