مرکز منطقه ای اطلاع رساني علوم و فناوري - Adaptive Anomaly Identification by Exploring Metric Subspace in Cloud Computing Infrastructures

DocumentCode :

2108626

Title :

Adaptive Anomaly Identification by Exploring Metric Subspace in Cloud Computing Infrastructures

Author :

Qiang Guan ; Song Fu

Author_Institution :

Dept. of Comput. Sci. & Eng., Univ. of North Texas, Denton, TX, USA

fYear :

2013

fDate :

Sept. 30 2013-Oct. 3 2013

Firstpage :

205

Lastpage :

214

Abstract :

Cloud computing has become increasingly popular by obviating the need for users to own and maintain complex computing infrastructures. However, due to their inherent complexity and large scale, production cloud computing systems are prone to various runtime problems caused by hardware and software faults and environmental factors. Autonomic anomaly detection is a crucial technique for understanding emergent, cloud-wide phenomena and self-managing cloud resources for system-level dependability assurance. To detect anomalous cloud behaviors, we need to monitor the cloud execution and collect runtime cloud performance data. These data consist of values of performance metrics for different types of failures, which display different correlations with the performance metrics. In this paper, we present an adaptive anomaly identification mechanism that explores the most relevant principal components of different failure types in cloud computing infrastructures. It integrates the cloud performance metric analysis with filtering techniques to achieve automated, efficient, and accurate anomaly identification. The proposed mechanism adapts itself by recursively learning from the newly verified detection results to refine future detections. We have implemented a prototype of the anomaly identification system and conducted experiments in an on-campus cloud computing environment and by using the Google data center traces. Our experimental results show that our mechanism can achieve more efficient and accurate anomaly detection than other existing schemes.

Keywords :

cloud computing; fault tolerant computing; formal verification; learning (artificial intelligence); resource allocation; system monitoring; system recovery; Google data center traces; adaptive anomaly identification; adaptive mechanism; anomalous cloud behavior detection; anomaly identification system; cloud computing infrastructure; cloud execution monitor; cloud performance metric analysis; complex computing infrastructure; complexity; detection result verification; emergent cloud-wide phenomena; environmental factor; failure type; filtering technique; hardware fault; metric subspace exploration; on-campus cloud computing environment; production cloud computing system; recursive learning; runtime cloud performance data collection; runtime problem; self-managing cloud resource; software fault; system-level dependability assurance; Cloud computing; Correlation; Measurement; Servers; Time series analysis; Virtual machine monitors; Virtual machining; Autonomic management; Cloud computing; Dependable systems; Failure detection; Learning algorithms;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Reliable Distributed Systems (SRDS), 2013 IEEE 32nd International Symposium on

Conference_Location :

Braga

Type :

conf

DOI :

10.1109/SRDS.2013.29

Filename :

6656276

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2108626