Title :
A longitudinal survey of Internet host reliability
Author :
Long, Darrell ; Muir, Andrew ; Golding, Richard
Author_Institution :
Dept. of Comput. & Inf. Sci., California Univ., Santa Cruz, CA, USA
Abstract :
An accurate estimate of host reliability is important for correct analysis of many fault-tolerance and replication mechanisms. In a previous study, we estimated host system reliability by querying a large number of hosts to find how long they had been functioning, estimating the mean time-to-failure (MTTF) and availability from those measures, and in turn deriving an estimate of the mean time-to-repair (MTTR). However, this approach had a bias towards more reliable hosts that could result in overestimating MTTR and underestimating availability. To address this bias we have conducted a second experiment using a fault-tolerant replicated monitoring tool. This tool directly measures TTF, TTR, and availability by polling many sites frequently from several locations. We find that these more accurate results generally confirm and improve our earlier estimates, particularly for TTR. We also find that failure and repair are unlikely to follow Poisson processes
Keywords :
Internet; fault tolerant computing; reliability; Internet host reliability; fault-tolerance; fault-tolerant replicated monitoring; host reliability; host system reliability; replication; Access protocols; Availability; Condition monitoring; Failure analysis; Fault tolerance; Fault tolerant systems; Hardware; Information analysis; Internet; Power system reliability;
Conference_Titel :
Reliable Distributed Systems, 1995. Proceedings., 14th Symposium on
Conference_Location :
Bad Neuenahr
Print_ISBN :
0-8186-7153-X
DOI :
10.1109/RELDIS.1995.518718