Title :
Balancing system availability and lifetime with dynamic hidden Markov models
Author :
Panerati, Jacopo ; Abdi, Samar ; Beltrame, Giovanni
Author_Institution :
Dept. de Genie Inf. et Genie Logiciel, Ecole Polytech. de Montreal, Montréal, QC, Canada
Abstract :
Electronic components in space applications are subject to high levels of ionizing and particle radiation. Their lifetime is reduced by the former (especially at high levels of utilization) and transient errors might be caused by the latter. Transient errors can be detected and corrected using memory scrubbing. However, this causes an overhead that reduces both the availability and the lifetime of the system. In this work, we present a mechanism based on dynamic hidden Markov models (D-HMMs) that balances availability and lifetime of a multi-resource system by estimating the occurrence of permanent faults amid transient faults, and by dynamically migrating the computation on excess resources when failure occurs. The dynamic nature of the model makes it adaptable to different mission profiles and fault rates. Results show that our model is able to lead systems to their desired lifetime, while keeping availability within the 2% of its ideal value, and it outperforms static rule-based and traditional hidden Markov models (HMMs) approaches.
Keywords :
avionics; failure analysis; fault diagnosis; fault tolerant computing; hidden Markov models; integrated circuit reliability; radiation hardening (electronics); system recovery; D-HMM; dynamic hidden markov models; electronic components; memory scrubbing; multiresource system; particle radiation; permanent faults; transient errors; transient faults; Hidden Markov models; Transient analysis;
Conference_Titel :
Adaptive Hardware and Systems (AHS), 2014 NASA/ESA Conference on
Conference_Location :
Leicester
DOI :
10.1109/AHS.2014.6880183