Title :
Reducing service failures by failure and workload aware load balancing in SaaS clouds
Author :
Roy, Anirban ; Ganesan, Rajeshwari ; Dash, Denver ; Sarkar, Santonu
Author_Institution :
Infosys Labs. Electron. City, Bangalore, India
Abstract :
SLA violations are typically viewed as service failures. If service fails once, it will fail again unless remedial action is taken. In a virtualized environment, a common remedial action is to restart or reboot a virtual machine (VM). In this paper we present, a VM live-migration policy that is aware of SLA threshold violations of workload response time, physical machine (PM) and VM utilization as well as availability violations at the PM and VM. In the migration policy we take into account PM failures and VM (software) failures as well as workload features such as burstiness (coefficient of variation or CoV >1) which calls for caution during the selection of target PM when migrating these workloads. The proposed policy also considers migration of a VM when the utilization of the physical machine hosting the VM approaches its utilization threshold. We propose an algorithm that detects proactive triggers for remedial action, selects a VM (for migration) and also suggests a possible target PM. We show the efficacy of our proposed approach by plotting the decrease in the number of SLA violations in a system using our approach over existing approaches that do not trigger migration in response to non-availability related SLA violations, via discrete event simulation of a relevant case study.
Keywords :
cloud computing; discrete event simulation; resource allocation; software fault tolerance; virtual machines; virtualisation; PM failures; PM utilization; SLA threshold violations; SaaS clouds; VM failures; VM live migration policy; availability violations; discrete event simulation; failure aware load balancing; physical machine utilization; proactive trigger detection; remedial action; service failure reduction; software failures; virtual machine; virtualized environment; workload aware load balancing; workload response time; Availability; Degradation; Preventive maintenance; Random variables; Software as a service; Time factors; VM migration; cloud data center; coefficient of variation; discrete event simulation; failure model; software aging;
Conference_Titel :
Dependable Systems and Networks Workshop (DSN-W), 2013 43rd Annual IEEE/IFIP Conference on
Conference_Location :
Budapest
DOI :
10.1109/DSNW.2013.6615511