Title :
Online failure prediction in cloud datacenters by real-time message pattern learning
Author :
Watanabe, Yoshihiro ; Otsuka, Hiroyuki ; Sonoda, M. ; Kikuchi, Shinji ; Matsumoto, Yuki
Author_Institution :
Cloud Comput. Res. Center, FUJITSU Labs. Ltd., Kawasaki, Japan
Abstract :
Once failures occur in a cloud datacenter accommodating a large number of virtual resources, they tend to spread rapidly and widely, impacting on many cloud users (tenant owners). One of the best ways to prevent a failure from spreading in the system is identifying signs of the failure before its occurrence and deal with it proactively before it causes serious problems. Although several approaches have been proposed to predict failures by analyzing past system message logs and identifying the relationship between the messages and the failures, it is still difficult to automatically predict the failure for several reasons such as various types of log message formats or time gaps between message pattern learning and application of the identified patterns in real systems. Based on this understanding, we propose a new failure prediction method in this paper which learns message patterns as the signs of failure automatically by classifying messages by their similarity without depending on their format and re-Iearning of message patterns in frequently-changed configurations. We implemented our failure prediction method and evaluated it by using system log data recorded in an actual cloud datacenter. The experimental result shows that our approach predicted failures with 80% precision and covered 90% of failure occurrences.
Keywords :
cloud computing; computer centres; failure analysis; learning (artificial intelligence); message passing; real-time systems; automatic failure prediction; cloud data centers; frequently-changed configurations; log message formats; message classification; message patterns relearning; online failure prediction; past system message logs analysis; real-time message pattern learning; sign identification; time gaps; virtual resources; Cloud computing; Conferences; Decision support systems; Frequency modulation; Handheld computers; Bayesian probability; Operation and management; cloud computing; failure prediction; message pattern learning;
Conference_Titel :
Cloud Computing Technology and Science (CloudCom), 2012 IEEE 4th International Conference on
Conference_Location :
Taipei
Print_ISBN :
978-1-4673-4511-8
Electronic_ISBN :
978-1-4673-4509-5
DOI :
10.1109/CloudCom.2012.6427566