DocumentCode :
3601333
Title :
Predicting Transient Downtime in Virtual Server Systems: An Efficient Sample Path Randomization Approach
Author :
Du, Anna Ye ; Das, Sanjukta ; Zhouhan Yang ; Chunming Qiao ; Ramesh, R.
Author_Institution :
Dept. of Manage. Sci. & Syst., State Univ. of New York, Buffalo, NY, USA
Volume :
64
Issue :
12
fYear :
2015
Firstpage :
3541
Lastpage :
3554
Abstract :
A central challenge in developing cloud datacenters Service Level Agreements is the estimation of downtime distribution of a set of provisioned servers over a service window, which is compounded by three facts. First, while steady-state probabilities have been derived for birth-death processes involving server failures and repairs, they could be highly inaccurate under transience. Furthermore, steady-state cannot be assured under typical service windows. Therefore, estimation of transient distributions is essential. Second, the processes of failures and repairs may follow any distribution and hence need to be extracted using system log data and modeled using appropriate general distributions. Third, downtime distributions over service windows depend on the number of servers and their deployment structure for a contract. We develop an efficient and generalized sample path randomization approach to precisely estimate transient probabilities under three different checkpointing strategies and three flexible failure distribution models. The estimators are unbiased, consistent, efficient and sufficient. Their asymptotic convergence is established. The estimation algorithms are computationally efficient in solving practical problems and yield rich information on transient system behaviors. The methodology is general and extensible to various server failure and repair processes characterized using birth-death modeling.
Keywords :
checkpointing; cloud computing; computer centres; contracts; probability; virtualisation; asymptotic convergence; birth-death modeling; birth-death processes; checkpointing strategies; cloud datacenter service level agreements; contract deployment structure; downtime distribution estimation; flexible failure distribution models; sample path randomization approach; server failure process; server repair process; service windows; steady-state probabilities; transient distribution estimation; transient downtime prediction; transient probabilities; transient system behaviors; virtual server systems; Cloud computing; Computational modeling; Maintenance engineering; Markov chains; Predictive models; Virtualization; Cloud computing; Markov chains; fault-tolerant systems; virtual infrastructure; virtual infrastructure.;
fLanguage :
English
Journal_Title :
Computers, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9340
Type :
jour
DOI :
10.1109/TC.2015.2394437
Filename :
7038208
Link To Document :
بازگشت