Author_Institution :
HP-Autonomy Res., Sunnyvale, CA, USA
Abstract :
Consider a two-component replica system having constant repair time for faults occurring on either component. Such a system models, for example, a RAID-4/5 storage array where two disks are mirrors of each other, and faults on one are corrected using data from the other. The object of our study is the loss function: namely, the distribution function of losses that result from a failure on the dual component during the time that an error in the first component is being repaired. Most previous studies of such systems make certain simplifying assumptions, in order to reduce them to a Markov system. The benefit of such a simplification is that the mean of the simplified loss function can be readily obtained. An example of such a mean is the frequently used `Mean Time to Data Loss (MTTDL)´ for the example of RAID arrays. Such measures are then typically included in an empirical study using data gathered from actual systems. The disadvantage of such an approach is that we lose access to the actual distribution, which could potentially encode important information. In this work, we present a theoretical study of a two-component replica system. The target of the study is the entire loss function: namely, the probability density function (pdf) of the loss event. The obstacle to computing this pdf is that the underlying transition diagram is no longer a Markov chain with exponential transition probabilities. We compute the entire pdf of the loss function. In order to obtain this pdf, we use techniques from transform theory. Once the loss function is derived, information such as the number of times each component goes through repair, the differences in error probabilities at various time periods, etc. may be obtained from it. The exponential transitions in a Markov system are memory-less. There has been considerable debate regarding the suitability of a memoryless assumption on failures in a real-world system. Accordingly, we investigate the memoryless property at various st- ges of our derivation of the pdf of the loss function. We identify precisely the stages in the derivation of the loss function where a memoryless assumption is made. As expected, the final loss function for the dual-component scheme is not memoryless. However, somewhat surprisingly, it has a natural estimation using a memoryless loss function.
Keywords :
Markov processes; RAID; fault tolerant computing; MTTDL; Markov system; RAID-4-5 storage array; constant repair time; error probabilities; exponential transition probabilities; loss event; loss functions; mean-time-to-data-loss; memoryless assumption; paired-replicas; probability density function; transform theory; transition diagram; two-component replica system; Distribution functions; Exponential distribution; Maintenance engineering; Markov processes; Probability density function; Random variables; Reliability;