Title :
Learning to self-recover
Author :
Reidemeister, Thomas ; Jiang, Miao ; Ward, Paul A S
Author_Institution :
Shoshin Res. Group, Univ. of Waterloo, London, ON, Canada
Abstract :
Business success is contingent on dependable, yet affordable, software systems; this implies a need for self-recovering cloud-based component software systems. In prior work we demonstrated a discrete controller that allows scheduling of recovery actions based on uncertain fault knowledge. That approach required detailed analysis of historic failure data. In this paper we examine adaptive learning through active exploration and demonstrate the impact of drifting or invalid knowledge about recovery actions.
Keywords :
cloud computing; learning (artificial intelligence); system recovery; cloud-based component software; discrete controller; failure data; learning; scheduling; self-recovery; Analytical models; Argon; Computer crashes; Integrated circuits; Presses;
Conference_Titel :
Integrated Network Management (IM), 2011 IFIP/IEEE International Symposium on
Conference_Location :
Dublin
Print_ISBN :
978-1-4244-9219-0
Electronic_ISBN :
978-1-4244-9220-6
DOI :
10.1109/INM.2011.5990506