Title :
Resilient Virtual Clusters
Author :
Le, Michael ; Hsu, Israel ; Tamir, Yuval
Author_Institution :
Comput. Sci. Dept., UCLA, Los Angeles, CA, USA
Abstract :
Clusters of computers can provide, in aggregate, reliable services despite the failure of individual computers. System-level virtualization is widely used to consolidate the workload of multiple physical systems as multiple virtual machines (VMs) on a single physical computer. A single physical computer thus forms a fIvirtual clusterfP of VMs. A key difficulty with virtualization is that the failure of the virtualization infrastructure (VI) often leads to the failure of multiple VMs. This is likely to overload "cluster computing" resiliency mechanisms, typically designed to tolerate the failure of only a single node at a time. By supporting recovery from failure of key VI components, we have enhanced the resiliency of a VI (Xen), thus enabling the use of existing "cluster computing" techniques to provide resilient virtual clusters. In the overwhelming majority of cases, these enhancements allow recovery from errors in the VI to be accomplished without the failure of more than a single VM. The resulting resiliency of the virtual cluster is demonstrated by running two existing "cluster computing" systems while subjecting the VI to injected faults.
Keywords :
Linux; middleware; pattern clustering; system recovery; virtual machines; cluster computing resiliency mechanism; individual computer failure; key VI component; multiple physical system; multiple virtual machine; physical computer; reliable service; resilient virtual cluster; system-level virtualization; virtualization infrastructure; Computer crashes; Computers; Fault tolerance; Fault tolerant systems; Middleware; Servers; Cluster; Microreboot; Middleware; Recovery; Reliability; Virtualization;
Conference_Titel :
Dependable Computing (PRDC), 2011 IEEE 17th Pacific Rim International Symposium on
Conference_Location :
Pasadena, CA
Print_ISBN :
978-1-4577-2005-5
Electronic_ISBN :
978-0-7695-4590-5
DOI :
10.1109/PRDC.2011.33