Title :
Radiata: Enabling Whole System Hot-mirroring via Continual State Replication
Author :
Chen, Yang ; Hu, Chunming ; Wo, Tianyu
Author_Institution :
Sch. of Comput. Sci. & Eng., Beihang Univ., Beijing, China
Abstract :
Checkpoint-recovery based on system virtualization is an attractive approach for providing the transparent and economic fault tolerance service in virtualized environments. The previous approaches introduce either great performance degradation or complex implementation issues. In this work, we propose a whole system hot-mirroring platform, namely Radiata, to provide fault-tolerance for any type of service by encapsulating the service instance into a virtual machine, and hot-mirroring the state changes of the virtual machine via the continual state replication. Our approach exploits three key optimizations for further reduction of the performance overhead: the asynchronous state replication, the COW-based memory checkpoint and the dirty page prediction. Based on the KVM platform, we have implemented the prototype system. The comprehensive evaluations under a variety of workloads demonstrate that Radiata is able to effectively support rapid and transparent fail-over in case of unexpected hardware failure, and outperforms the existing mechanisms in terms of the performance degradation in failure-free condition.
Keywords :
checkpointing; failure analysis; fault tolerance; optimisation; virtual machines; COW-based memory checkpoint; KVM; Radiata; asynchronous state replication; checkpoint recovery; continual state replication; economic fault tolerance service; failure free condition; hot mirroring platform; optimization; performance overhead; prototype system; transparent fault tolerance service; virtual environment; virtual machine; workload demonstration; Degradation; Fault tolerance; Fault tolerant systems; Hardware; Servers; Virtual machining; COW-base checkpoint; continual replication; fault tolerance; hot-mirroring; virtual machine;
Conference_Titel :
High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE 14th International Conference on
Conference_Location :
Liverpool
Print_ISBN :
978-1-4673-2164-8
DOI :
10.1109/HPCC.2012.70