DocumentCode :
2898316
Title :
Evaluating availability under quasi-heavy-tailed repair times
Author :
Kato, Sei ; Osogami, Takayuki
Author_Institution :
Tokyo Res. Lab., IBM Res., Yamato
fYear :
2008
fDate :
24-27 June 2008
Firstpage :
442
Lastpage :
451
Abstract :
The time required to recover from failures has a great impact on the availability of information technology (IT) systems. We define a class of probability distributions named quasi-heavy-tailed distributions as those distributions whose time series graph of the sample mean shows intermittent jumps in a given period. We find that the distribution of repair time is quasi-heavy-tailed for three IT systems, an in-house system hosted by IBM, a high performance computing system at the Los Alamos National Laboratory, and a distributed memory computer at the National Energy Research Scientific Computing Center. This means that the mean time to repair estimated by observing incidents within a certain period could dramatically change if we observe incidents successively for another period. In other words, the estimated mean time to repair has large fluctuations over time. As a result, classical metrics based on the mean time to repair are not optimal for evaluating the availability of these systems. We propose to evaluate the availability of IT systems with the T-year return value, estimated based on extreme value theory. The T-year return value refers to the value that the repair time exceeds on average once every estimated T years. We find that the T-year return value is a sound metric of the availability of the three IT systems.
Keywords :
software maintenance; statistical distributions; system recovery; Los Alamos National Laboratory; National Energy Research Scientific Computing Center; distributed memory computer; extreme value theory; information technology; intermittent jumps; quasiheavy-tailed repair times; Distributed computing; Fluctuations; High performance computing; Information technology; Laboratories; Probability distribution; Robustness; Scientific computing; Statistical distributions; Time measurement;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Dependable Systems and Networks With FTCS and DCC, 2008. DSN 2008. IEEE International Conference on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4244-2397-2
Electronic_ISBN :
978-1-4244-2398-9
Type :
conf
DOI :
10.1109/DSN.2008.4630115
Filename :
4630115
Link To Document :
بازگشت