DocumentCode
2149723
Title
An extensible framework for repair-driven monitoring
Author
Reidemeister, Thomas ; Jiang, Miao ; Ward, Paul A S
Author_Institution
E&CE Dept., Univ. of Waterloo, Waterloo, ON, Canada
fYear
2010
fDate
25-29 Oct. 2010
Firstpage
142
Lastpage
149
Abstract
In recent years autonomic computing, specifically autonomic data centre management has gained significant attention. Human intervention be minimized to reduce the operating costs of business applications. In this paper we focus our attention to the self-repair dimension and present a flexible probabilistic framework to develop agents for self-repair in the context of business-information-system components. Our framework seeks to pick the optimal sequence of repair actions given only imperfect information about the experienced fault. In contrast to existing recovery-oriented approaches, our model explicitly considers fault prevalence, symptoms of recurrent failures, and inclusive repair actions. We evaluate our proposal using discrete event simulation. Our evaluation shows that an optimal repair policy can be computed from a brief specification of repair actions. Even in the context of very unreliable error detection our controller is able to estimate the current state of the monitored system and recover from failure.
Keywords
business data processing; computer centres; cost reduction; discrete event simulation; fault tolerant computing; information systems; multi-agent systems; probability; system monitoring; agents; autonomic computing; autonomic data centre management; business application; business-information-system component; discrete event simulation; extensible framework; fault prevalence; flexible probabilistic framework; human intervention; inclusive repair actions; operating cost reduction; optimal repair policy; recurrent failure symptoms; repair-driven monitoring; self-repair dimension; system monitoring; Context; Fault diagnosis; Maintenance engineering; Markov processes; Mathematical model; Monitoring; Probes;
fLanguage
English
Publisher
ieee
Conference_Titel
Network and Service Management (CNSM), 2010 International Conference on
Conference_Location
Niagara Falls, ON
Print_ISBN
978-1-4244-8910-7
Electronic_ISBN
978-1-4244-8908-4
Type
conf
DOI
10.1109/CNSM.2010.5691320
Filename
5691320
Link To Document