DocumentCode
1855809
Title
Achieving reliability growth on real-time systems
Author
Lane, Christopher A. ; Morrison, Joseph D.
Author_Institution
IBM Corp., Rockville, MD, USA
fYear
1994
fDate
24-27Jan 1994
Firstpage
136
Lastpage
141
Abstract
This paper addresses the principles used to predict and attain reliability growth on real-time systems. System reliability modeling techniques that include software reliability, maintenance effectiveness, and failure recovery are discussed in detail. Several software reliability growth models are discussed with emphasis on measured reliability growth of fielded software. The impact of maintenance effectiveness, which is a measure of the maintainer´s skill and training levels, is shown. The need to develop and measure the robustness of failure recovery algorithms is emphasized in this paper. All of these factors are combined with the failure and repair characteristics of hardware to create comprehensive reliability growth models for real-time systems. Through the authors´ research, they have determined that effective failure recovery algorithms are the key to attaining highly reliable systems. Without them, redundant computer systems that run banking and air traffic control systems will come crashing down with possibly disastrous results. The modeling and measurement techniques discussed in this paper provide the reliability practitioner with the methods to predict and achieve reliability growth resulting from improved software reliability and recovery algorithms. A fault tolerant system´s ability to recover from hardware and software failures is gauged by a parameter called coverage. Coverage is the conditional probability of recovery given that a failure has occurred. Because of its huge impact on system reliability, the measurement of coverage is emphasized
Keywords
Markov processes; fault tolerant computing; real-time systems; reliability theory; software maintenance; software reliability; system recovery; conditional probability of recovery; coverage; failure recovery; fault tolerant system; maintenance effectiveness; real-time systems; reliability growth; reliability modeling techniques; robustness; software reliability; Air traffic control; Banking; Computer crashes; Hardware; Predictive models; Real time systems; Robustness; Software maintenance; Software measurement; Software reliability;
fLanguage
English
Publisher
ieee
Conference_Titel
Reliability and Maintainability Symposium, 1994. Proceedings., Annual
Conference_Location
Anaheim, CA
Print_ISBN
0-7803-1786-6
Type
conf
DOI
10.1109/RAMS.1994.291096
Filename
291096
Link To Document