• DocumentCode
    748236
  • Title

    System reliability analysis of an N-version programming application

  • Author

    Dugan, Joanne Bechta ; Lyu, Michael R.

  • Author_Institution
    Dept. of Electr. Eng., Virginia Univ., Charlottesville, VA, USA
  • Volume
    43
  • Issue
    4
  • fYear
    1994
  • fDate
    12/1/1994 12:00:00 AM
  • Firstpage
    513
  • Lastpage
    519
  • Abstract
    This paper presents a quantitative reliability analysis of a system designed to tolerate both hardware and software faults. The system achieves integrated fault tolerance by implementing N-version programming (NVP) on redundant hardware. The system analysis considers unrelated software faults, related software faults, transient hardware faults, permanent hardware faults, and imperfect coverage. The overall model is Markov in which the states of the Markov chain represent the long-term evolution of the system-structure. For each operational configuration, a fault-tree model captures the effects of software faults and transient hardware faults on the task computation. The software fault model is parameterized using experimental data associated with a recent implementation of an NVP system using the current design paradigm. The hardware model is parameterized by considering typical failure rates associated with hardware faults and coverage parameters. The authors results show that it is important to consider both hardware and software faults in the reliability analysis of an NVP system, since these estimates vary with time. Moreover, the function for error detection and recovery is extremely important to fault-tolerant software. Several orders of magnitude reduction in system unreliability can be observed if this function is provided promptly
  • Keywords
    Markov processes; fault tolerant computing; fault trees; programming; reliability; software fault tolerance; Markov chain; N-version programming; design paradigm; error detection; error recovery; failure rates; fault-tolerant software; fault-tree model; imperfect coverage; permanent hardware faults; reliability analysis; software faults; transient hardware faults; Aerospace control; Application software; Communication system software; Computer errors; Fault tolerant systems; Hardware; Mathematical programming; Patient monitoring; Reliability; Software design;
  • fLanguage
    English
  • Journal_Title
    Reliability, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9529
  • Type

    jour

  • DOI
    10.1109/24.370232
  • Filename
    370232