DocumentCode :
963935
Title :
On Switching Policies for Modular Redundancy Fault-Tolerant Computing Systems
Author :
Berg, Menachem ; Koren, Israel
Author_Institution :
Department of Industrial Engineering, University of Toronto, Toronto, Ont., Canada MSS1A4; Department of Statistics, University of Haifa, Haifa 31999, Israel.
Issue :
9
fYear :
1987
Firstpage :
1052
Lastpage :
1062
Abstract :
The objective of fault-tolerant computing systems is to provide an error-free operation in the presence of faults. The system has to recover from the effects of a fault by employing certain recovery procedures like program rollback, reload, and restart, etc. However, these recovery procedures, result in interruptions in the system´s operation, thus reducing the availability of the system for user applications. Fault-tolerant systems for critical applications include, therefore, standby spares that are ready to replace active modules which fail to recover from the effects of a fault. A standby spare may also be used to replace a module suffering from frequent fault occurrences resulting in too many repetitions of the recovery process, in order to increase the availability of the system for user applications. In this case a module switching policy is needed indicating upon a fault occurrence, whether to retry a failing module or switch it out and replace it by a spare, considering the remaining mission time and the probability of a system crash. A module switching policy for dynamic redundancy systems is presented in this paper and the improvement in application-oriented availability due to the use of this policy is illustrated.
Keywords :
Availability; Capacity planning; Electromagnetic transients; Error-free operation; Fault tolerant systems; Hardware; Redundancy; Steady-state; Switches; Time measurement; Application-oriented availability; deterioration models; failure rate; fault tolerance; modular redundancy; module switching policy; recovery; standby spare;
fLanguage :
English
Journal_Title :
Computers, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9340
Type :
jour
DOI :
10.1109/TC.1987.5009536
Filename :
5009536
Link To Document :
بازگشت