Title :
Faults, symptoms, and software fault tolerance in the Tandem GUARDIAN90 operating system
Author :
Lee, Inhwan ; Iyer, Ravishankar K.
Author_Institution :
Coordinated Sci. Lab., Illinois Univ., Urbana, IL, USA
Abstract :
The authors present a measurement-based study of software failures and recovery in the Tandem GUARDIAN90 operating system using a collection of memory dump analyses of field software failures. They identify the effects of software faults on the processor state and trace the propagation of the effects to other areas of the system. They also evaluate the role of the defensive programming techniques and the software fault tolerance of the process pair mechanism implemented in the Tandem system. Results show that the Tandem system tolerates nearly 82% of reported field software faults, thus demonstrating the effectiveness of the system against software faults. Consistency checks made by the operating system detect 52% of software problems and prevent any error propagation in 31% of software problems. Results also show that 72% of reported field software failures are recurrences of known software faults and 70% of the recurrence groups have identical characteristics.
Keywords :
software fault tolerance; Tandem GUARDIAN90 operating system; consistency checks; defensive programming techniques; field software failures; measurement-based study; memory dump analyses; process pair mechanism; processor state; recurrence groups; software failures; software fault tolerance; Coordinate measuring machines; Error analysis; Failure analysis; Fault diagnosis; Fault tolerant systems; Operating systems; Programming; Software maintenance; Software measurement; Software systems;
Conference_Titel :
Fault-Tolerant Computing, 1993. FTCS-23. Digest of Papers., The Twenty-Third International Symposium on
Conference_Location :
Toulouse, France
Print_ISBN :
0-8186-3680-7
DOI :
10.1109/FTCS.1993.627304