DocumentCode :
1081641
Title :
Using multi-stage and stratified sampling for inferring fault-coverage probabilities
Author :
Constantinescu, Cristian
Author_Institution :
Duke Univ., Durham, NC, USA
Volume :
44
Issue :
4
fYear :
1995
fDate :
12/1/1995 12:00:00 AM
Firstpage :
632
Lastpage :
639
Abstract :
Development of fault-tolerant computing systems requires accurate reliability modeling. Analytic, simulation, and hybrid models are commonly used for obtaining reliability measures. These measures are functions of component failure rates and fault-coverage (probabilities). Coverage provides information about the fault and error detection, isolation, and system recovery capabilities. This parameter can be derived by physical or simulated fault injection. Statistical inference has been used to extract meaningful information from sample observation. The problem of conducting fault injection experiments and statistically inferring the coverage from the information gathered in those experiments is addressed in this paper. We perform statistical experiments in a multi-dimensional space of events. In this way all major factors which influence the coverage (fault locations, timing characteristics of the fault, and the workload) are accounted for. Multi-stage, stratified, and combined multi-stage and stratified sampling are used in this paper for deriving the coverage. Equations of the mean, variance, and confidence interval of the coverage are provided. The statistical error produced by the injected faults which do not induce errors in the tested system (also known as the nonresponse problem) is considered, A program which emulates a typical fault environment was developed and four hypothetical systems are analyzed
Keywords :
fault tolerant computing; probability; reliability; reliability theory; system recovery; analytic models; component failure rates; confidence interval; error detection; fault detection; fault injection simulation; fault-coverage; fault-coverage probabilities; hybrid models; multi-stage sampling; nonresponse problem; reliability modeling; simulation models; statistical inference; stratified sampling; system recovery capabilities; timing characteristics; Analytical models; Computational modeling; Data mining; Fault detection; Fault location; Fault tolerant systems; Probability; Sampling methods; System recovery; Timing;
fLanguage :
English
Journal_Title :
Reliability, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9529
Type :
jour
DOI :
10.1109/24.475993
Filename :
475993
Link To Document :
بازگشت