DocumentCode :
3503598
Title :
Enhancing the fault-tolerance of nonmasking programs
Author :
Kulkarni, Sandeep S. ; Ebnenasir, Ali
Author_Institution :
Dept. of Comput. Sci. & Eng., Michigan State Univ., USA
fYear :
2003
fDate :
19-22 May 2003
Firstpage :
441
Lastpage :
449
Abstract :
In this paper we focus on automated techniques to enhance the fault-tolerance of a nonmasking fault-tolerant program to masking. A masking program continually satisfies its specification even if faults occur. By contrast, a nonmasking program merely guarantees that after faults stop occurring, the program recovers to states from where it continually satisfies its specification. Until the recovery is complete, however a nonmasking program can violate its (safety) specification. Thus, the problem of enhancing fault-tolerance from nonmasking to masking requires that safety be added and recovery be preserved. We focus on this enhancement problem for high atomicity programs-where each process can read all variables-and for distributed programs-where restrictions are imposed on what processes can read and write. We present a sound and complete algorithm for high atomicity programs and a sound algorithm for distributed programs. We also argue that our algorithms are simpler than previous algorithms, where masking fault-tolerance is added to a fault-intolerant program. Hence, these algorithms can partially reap the benefits of automation when the cost of adding masking fault-tolerance to a fault-intolerant program is high. To illustrate these algorithms, we show how the masking fault-tolerant programs for triple modular redundancy and Byzantine agreement can be obtained by enhancing the fault-tolerance of the corresponding nonmasking versions. We also discuss how the derivation of these programs is simplified when we begin with a nonmasking fault-tolerant program.
Keywords :
computational complexity; distributed algorithms; distributed programming; fault tolerant computing; formal specification; Byzantine agreement; atomicity program; distributed algorithm; fault-tolerance; formal specification; nonmasking program; program synthesis; triple modular redundancy; Automation; Computer science; Costs; Engineering profession; Fault tolerance; Fault tolerant systems; Laboratories; Redundancy; Safety; Software engineering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Distributed Computing Systems, 2003. Proceedings. 23rd International Conference on
ISSN :
1063-6927
Print_ISBN :
0-7695-1920-2
Type :
conf
DOI :
10.1109/ICDCS.2003.1203494
Filename :
1203494
Link To Document :
بازگشت