Title : 
Designing masking fault-tolerance via nonmasking fault-tolerance
         
        
            Author : 
Arora, Anish ; Kulkarni, Sandeep S.
         
        
            Author_Institution : 
Dept. of Comput. Sci., Ohio State Univ., Columbus, OH, USA
         
        
        
        
        
            fDate : 
6/1/1998 12:00:00 AM
         
        
        
        
            Abstract : 
Masking fault-tolerance guarantees that programs continually satisfy their specification in the presence of faults. By way of contrast, nonmasking fault-tolerance does not guarantee as much: it merely guarantees that when faults stop occurring, program executions converge to states from where programs continually (re)satisfy their specification. We present in this paper a component based method for the design of masking fault-tolerant programs. In this method, components are added to a fault-intolerant program in a stepwise manner, first, to transform the fault-intolerant program into a nonmasking fault-tolerant one and, then, to enhance the fault-tolerance from nonmasking to masking. We illustrate the method by designing programs for agreement in the presence of Byzantine faults, data transfer in the presence of message loss, triple modular redundancy in the presence of input corruption, and mutual exclusion in the presence of process fail-stops. These examples also serve to demonstrate that the method accommodates a variety of fault-classes. It provides alternative designs for programs usually designed with extant design methods, and it offers the potential for improved masking fault-tolerant programs
         
        
            Keywords : 
formal specification; software fault tolerance; Byzantine faults; fault-intolerant program; masking fault-tolerance; masking fault-tolerant programs; nonmasking fault-tolerance; process fail-stops; program executions; specification; triple modular redundancy; Costs; Design methodology; Detectors; Fault detection; Fault tolerance; Fault tolerant systems; Interconnected systems; Redundancy;
         
        
        
            Journal_Title : 
Software Engineering, IEEE Transactions on