• DocumentCode
    1010784
  • Title

    FINE: A fault injection and monitoring environment for tracing the UNIX system behavior under faults

  • Author

    Kao, Wei-lun ; Iyer, Ravishankar K. ; Tang, Dong

  • Author_Institution
    Center for Reliable & High Performance Comput., Illinois Univ., Urbana, IL, USA
  • Volume
    19
  • Issue
    11
  • fYear
    1993
  • fDate
    11/1/1993 12:00:00 AM
  • Firstpage
    1105
  • Lastpage
    1118
  • Abstract
    The authors present a fault injection and monitoring environment (FINE) as a tool to study fault propagation in the UNIX kernel. FINE injects hardware-induced software errors and software faults into the UNIX kernel and traces the execution flow and key variables of the kernel. FINE consists of a fault injector, a software monitor, a workload generator, a controller, and several analysis utilities. Experiments on SunOS 4.1.2 are conducted by applying FINE to investigate fault propagation and to evaluate the impact of various types of faults. Fault propagation models are built for both hardware and software faults. Transient Markov reward analysis is performed to evaluate the loss of performance due to an injected fault. Experimental results show that memory and software faults usually have a very long latency, while bus and CPU faults tend to crash the system immediately. About half of the detected errors are data faults, which are detected when the system is tries to access an unauthorized memory location. Only about 8% of faults propagate to other UNIX subsystems. Markov reward analysis shows that the performance loss incurred by bus faults and CPU faults is much higher than that incurred by software and memory faults. Among software faults, the impact of pointer faults is higher than that of nonpointer faults
  • Keywords
    Unix; program testing; software tools; system monitoring; CPU faults; FINE; SunOS 4.1.2; UNIX system behavior; analysis utilities; bus faults; fault injection and monitoring environment; fault injector; hardware-induced software errors; pointer faults; software faults; software monitor; transient Markov reward analysis; workload generator; Computer crashes; Delay; Fault detection; Hardware; Kernel; Monitoring; Performance analysis; Performance evaluation; Performance loss; Transient analysis;
  • fLanguage
    English
  • Journal_Title
    Software Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0098-5589
  • Type

    jour

  • DOI
    10.1109/32.256857
  • Filename
    256857