DocumentCode :
2284390
Title :
Eigen space based method for detecting faulty nodes in large scale enterprise systems
Author :
Agarwal, Manoj K.
Author_Institution :
IBM India Res. Labs., New Delhi
fYear :
2008
fDate :
7-11 April 2008
Firstpage :
224
Lastpage :
231
Abstract :
In modern enterprise system environment when systemspsila performance degrades, detecting the anomaly is a hard problem. In this replicated environment, there can be hundreds or even thousands of server nodes for a single application. These nodes have implicit as well as explicit interdependencies with each other. Further due to heterogeneous capacities of nodes in the cluster, same fault may produce vastly different effect on the monitored metrics of different nodes. In case of performance problem, finding faulty node(s) in this environment is tedious and time consuming exercise with constantly changing workload, topology and SLA requirements. In this paper we present a novel eigen space based technique to detect anomaly in enterprise environment without any extra monitoring overhead. We monitor certain metrics on each of the node in cluster which are available in enterprise environment. We need a small number of most recent samples of each of these monitored metrics as our only historical information. Our technique adapts well in dynamic conditions, simple to operate and in case of an anomaly, automatically produces a list of faulty node(s). We have implemented this method in a 3-tier cluster environment with total 13 nodes. We have tested our algorithm by introducing faults in front tier, middle tier and backend tier. Our method is always able to separate out faulty nodes with high accuracy and precision.
Keywords :
business data processing; eigenvalues and eigenfunctions; large-scale systems; security of data; anomaly detection; eigen space; enterprise environment; faulty node detection; large scale enterprise system; Art; Clustering algorithms; Condition monitoring; Degradation; Delay; Fault detection; Fault diagnosis; Large-scale systems; Testing; Topology; eigen space analysis; problem detection; system management;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Network Operations and Management Symposium, 2008. NOMS 2008. IEEE
Conference_Location :
Salvador, Bahia
ISSN :
1542-1201
Print_ISBN :
978-1-4244-2065-0
Electronic_ISBN :
1542-1201
Type :
conf
DOI :
10.1109/NOMS.2008.4575138
Filename :
4575138
Link To Document :
بازگشت