DocumentCode :
2108626
Title :
Adaptive Anomaly Identification by Exploring Metric Subspace in Cloud Computing Infrastructures
Author :
Qiang Guan ; Song Fu
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of North Texas, Denton, TX, USA
fYear :
2013
fDate :
Sept. 30 2013-Oct. 3 2013
Firstpage :
205
Lastpage :
214
Abstract :
Cloud computing has become increasingly popular by obviating the need for users to own and maintain complex computing infrastructures. However, due to their inherent complexity and large scale, production cloud computing systems are prone to various runtime problems caused by hardware and software faults and environmental factors. Autonomic anomaly detection is a crucial technique for understanding emergent, cloud-wide phenomena and self-managing cloud resources for system-level dependability assurance. To detect anomalous cloud behaviors, we need to monitor the cloud execution and collect runtime cloud performance data. These data consist of values of performance metrics for different types of failures, which display different correlations with the performance metrics. In this paper, we present an adaptive anomaly identification mechanism that explores the most relevant principal components of different failure types in cloud computing infrastructures. It integrates the cloud performance metric analysis with filtering techniques to achieve automated, efficient, and accurate anomaly identification. The proposed mechanism adapts itself by recursively learning from the newly verified detection results to refine future detections. We have implemented a prototype of the anomaly identification system and conducted experiments in an on-campus cloud computing environment and by using the Google data center traces. Our experimental results show that our mechanism can achieve more efficient and accurate anomaly detection than other existing schemes.
Keywords :
cloud computing; fault tolerant computing; formal verification; learning (artificial intelligence); resource allocation; system monitoring; system recovery; Google data center traces; adaptive anomaly identification; adaptive mechanism; anomalous cloud behavior detection; anomaly identification system; cloud computing infrastructure; cloud execution monitor; cloud performance metric analysis; complex computing infrastructure; complexity; detection result verification; emergent cloud-wide phenomena; environmental factor; failure type; filtering technique; hardware fault; metric subspace exploration; on-campus cloud computing environment; production cloud computing system; recursive learning; runtime cloud performance data collection; runtime problem; self-managing cloud resource; software fault; system-level dependability assurance; Cloud computing; Correlation; Measurement; Servers; Time series analysis; Virtual machine monitors; Virtual machining; Autonomic management; Cloud computing; Dependable systems; Failure detection; Learning algorithms;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Reliable Distributed Systems (SRDS), 2013 IEEE 32nd International Symposium on
Conference_Location :
Braga
Type :
conf
DOI :
10.1109/SRDS.2013.29
Filename :
6656276
Link To Document :
بازگشت