DocumentCode :
3026094
Title :
Proactive fault handling for system availability enhancement
Author :
Salfner, Felix ; Malek, Miroslaw
Author_Institution :
Dept. of Comput. Sci., Humboldt-Univ., Berlin, Germany
fYear :
2005
fDate :
4-8 April 2005
Abstract :
Proactive fault handling combines prevention and repair actions with failure prediction techniques. We extend the standard availability formula by five key measures: (1) precision and (2) recall assess failure prediction while failure handling is gauged by (3) prevention probability, (4) repair time improvement, and (5) risk of introducing additional failures. We give a short survey of actions that are suited to be combined with failure prediction and provide a procedure to estimate the five key measures. Altogether, this allows to quantify the impact of proactive fault handling on system availability and may provide valuable input for system design.
Keywords :
failure analysis; fault tolerant computing; probability; system recovery; failure prediction techniques; prevention probability; proactive fault handling; repair time improvement; standard availability formula; system availability enhancement; Computer science; Counting circuits; Distributed processing; Equations; Measurement standards; Prediction methods; Preventive maintenance; Processor scheduling; State estimation; Time measurement;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International
Print_ISBN :
0-7695-2312-9
Type :
conf
DOI :
10.1109/IPDPS.2005.360
Filename :
1420243
Link To Document :
بازگشت