DocumentCode :
1010784
Title :
FINE: A fault injection and monitoring environment for tracing the UNIX system behavior under faults
Author :
Kao, Wei-lun ; Iyer, Ravishankar K. ; Tang, Dong
Author_Institution :
Center for Reliable & High Performance Comput., Illinois Univ., Urbana, IL, USA
Volume :
19
Issue :
11
fYear :
1993
fDate :
11/1/1993 12:00:00 AM
Firstpage :
1105
Lastpage :
1118
Abstract :
The authors present a fault injection and monitoring environment (FINE) as a tool to study fault propagation in the UNIX kernel. FINE injects hardware-induced software errors and software faults into the UNIX kernel and traces the execution flow and key variables of the kernel. FINE consists of a fault injector, a software monitor, a workload generator, a controller, and several analysis utilities. Experiments on SunOS 4.1.2 are conducted by applying FINE to investigate fault propagation and to evaluate the impact of various types of faults. Fault propagation models are built for both hardware and software faults. Transient Markov reward analysis is performed to evaluate the loss of performance due to an injected fault. Experimental results show that memory and software faults usually have a very long latency, while bus and CPU faults tend to crash the system immediately. About half of the detected errors are data faults, which are detected when the system is tries to access an unauthorized memory location. Only about 8% of faults propagate to other UNIX subsystems. Markov reward analysis shows that the performance loss incurred by bus faults and CPU faults is much higher than that incurred by software and memory faults. Among software faults, the impact of pointer faults is higher than that of nonpointer faults
Keywords :
Unix; program testing; software tools; system monitoring; CPU faults; FINE; SunOS 4.1.2; UNIX system behavior; analysis utilities; bus faults; fault injection and monitoring environment; fault injector; hardware-induced software errors; pointer faults; software faults; software monitor; transient Markov reward analysis; workload generator; Computer crashes; Delay; Fault detection; Hardware; Kernel; Monitoring; Performance analysis; Performance evaluation; Performance loss; Transient analysis;
fLanguage :
English
Journal_Title :
Software Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
0098-5589
Type :
jour
DOI :
10.1109/32.256857
Filename :
256857
Link To Document :
بازگشت