DocumentCode :
262306
Title :
Quantifying Failure Risk of Version Switch for Rolling Upgrade on Clouds
Author :
Sun, Daniel ; Bass, Len ; Fekete, Alan ; Gramoli, Vincent ; An Binh Tran ; Xu, Sherry ; Liming Zhu
Author_Institution :
Software Syst. Res. Group, NICTA, Sydney, NSW, Australia
fYear :
2014
fDate :
3-5 Dec. 2014
Firstpage :
175
Lastpage :
182
Abstract :
Rolling upgrade is an industry technique for online dynamic software update. A rolling upgrade updates a small number of instances in an old version to a new version at a time and the operation is repeated in a wave rolling until all of the instances have been upgraded. In many cases, the software needs to avoid interactions between different versions. One common simple approach is to make instances version aware, and then a version switch point can be chosen to deactivate the old service and activate the new service. On a Cloud platform, upgrades can be implemented simply through replacing old virtual machine instances with ones in new versions, and during the process of rolling upgrade various failures may present. If an instance fails, a new instance has to be launched from the backup images, which in most software systems are in an old version and cannot be simply replaced to a new version if the new software and the new service have not been stable for the sake of reliability and stability. Thus the progress of the rolling upgrade is not guaranteed, and indeed the number of upgraded instances can sometimes decrease. We aim to determine the probability that, after switching the versions at a selected point, the number of working instances may sometime fall below the amount needed for a desired Quality of Service. In this paper, we stochastically quantify the risk with a family of discrete Markov chains (DTMC). The evaluation in both Amazon Web Service (AWS) and simulation reveals that our technique can well predict the risks after given version switch points.
Keywords :
Markov processes; Web services; cloud computing; configuration management; quality of service; risk management; virtual machines; AWS; Amazon Web service; DTMC; backup images; cloud platform; discrete Markov chains; failure risk quantification; industry technique; online dynamic software update; quality of service; rolling upgrade updates; version switch point; virtual machine instances; wave rolling; Fault tolerance; Fault tolerant systems; Software systems; Switches; Virtual machining; Cloud operation; Risk; Rolling upgrade; Software dependability; Stochastic model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data and Cloud Computing (BdCloud), 2014 IEEE Fourth International Conference on
Conference_Location :
Sydney, NSW
Type :
conf
DOI :
10.1109/BDCloud.2014.16
Filename :
7034783
Link To Document :
بازگشت