Author :
Takano, Ryousei ; Nakada, Hidemoto ; Hirofuchi, Takahiro ; Tanaka, Yuichi ; Kudoh, T.
Abstract :
An HPC cloud, a flexible and robust cloud computing service specially dedicated to high performance computing, is a promising future e-Science platform. In cloud computing, virtualization is widely used to achieve flexibility and security. Virtualization makes migration or checkpoint/restart of computing elements (virtual machines) easy, and such features are useful for realizing fault tolerance and server consolidations. However, in widely used virtualization schemes, I/O devices are also virtualized, and thus I/O performance is severely degraded. To cope with this problem, VMM-bypass I/O technologies, including PCI passthrough and SR-IOV, in which the I/O overhead can be significantly reduced, have been introduced. However, such VMM-bypass I/O technologies make it impossible to migrate or checkpoint/restart virtual machines, since virtual machines are directly attached to hardware devices. This paper proposes a novel and practical mechanism, called Symbiotic Virtualization (SymVirt), for enabling migration and checkpoint/restart on a virtualized cluster with VMM-bypass I/O devices, without the virtualization overhead during normal operations. SymVirt allows a VMM to cooperate with a message passing layer on the guest OS, then it realizes VM-level migration and checkpoint/restart by using a combination of a PCI hotplug and coordination of distributed VMMs. We have implemented the proposed mechanism on top of QEMU/KVM and the Open MPI system. All PCI devices, including Infiniband and Myrinet, are supported without implementing specific para-virtualized drivers; and it is not necessary to modify either of the MPI runtime and applications. Using the proposed mechanism, we demonstrate reactive and proactive FT mechanisms on a virtualized Infiniband cluster. We have confirmed the effectiveness using both a memory intensive micro benchmark and the NAS parallel benchmark. Moreover, we also show that postcopy live migration enables us to reduce the down time of an applica- ion as the memory footprint increases.
Keywords :
checkpointing; cloud computing; fault tolerant computing; message passing; open systems; parallel processing; peripheral interfaces; virtual machines; virtualisation; workstation clusters; HPC cloud; I/O overhead reduction; I/O performance; Myrinet; NAS parallel benchmark; PCI devices; PCI hotplug; PCI passthrough; QEMU/KVM; SR-IOV; SymVirt; VM-level checkpoint-restart; VM-level migration; VMM-bypass I/O devices; distributed VMM coordination; e-Science platform; fault tolerance; guest OS; memory footprint; memory intensive micro benchmark; message passing layer; open MPI system; postcopy live migration; proactive FT mechanism; reactive FT mechanism; robust cloud computing; server consolidation; symbiotic virtualization; virtual machine checkpoint-restart; virtual machine monitor; virtualized HPC cluster; virtualized Infiniband cluster; virtualized cluster migration; Benchmark testing; Cloud computing; Fault tolerance; Fault tolerant systems; Virtual machine monitors; Virtual machining;