Title :
Towards A Model-Based Autonomic Reliability Framework for Computing Clusters
Author :
Dubey, Abhishek ; Nordstrom, Steve ; Keskinpala, Turker ; Neema, Sandeep ; Bapty, Ted ; Karsai, Gabor
fDate :
March 31 2008-April 4 2008
Abstract :
One of the primary problems with computing clusters is to ensure that they maintain a reliable working state most of the time to justify economics of operation. In this paper, we introduce a model-based hierarchical reliability framework that enables periodic monitoring of vital health parameters across the cluster and provides for autonomic fault mitigation. We also discuss some of the challenges faced by autonomic reliability frameworks in cluster environments such as non-determinism in task scheduling in standard operating systems such as Linux and need for synchronized execution of monitoring sensors across the cluster. Additionally, we present a solution to these problems in the context of our framework, which utilizes a feedback controller based approach to compensate for the scheduling jitter in non real-time operating systems. Finally, we present experimental data that illustrates the effectiveness of our approach.
Keywords :
Adaptive control; Computer networks; Environmental economics; Hardware; Jitter; Linux; Maintenance; Operating systems; Quantum computing; Sensor systems; Autonomic Computing; Cluster Computing; Model Integrated Computing; Model-Based Design; Reliability;
Conference_Titel :
Engineering of Autonomic and Autonomous Systems, 2008. EASE 2008. Fifth IEEE Workshop on
Conference_Location :
Belfast, Northern Ireland
Print_ISBN :
0-7695-3140-7
DOI :
10.1109/EASe.2008.15