• DocumentCode
    1827264
  • Title

    Radiata: Enabling Whole System Hot-mirroring via Continual State Replication

  • Author

    Chen, Yang ; Hu, Chunming ; Wo, Tianyu

  • Author_Institution
    Sch. of Comput. Sci. & Eng., Beihang Univ., Beijing, China
  • fYear
    2012
  • fDate
    25-27 June 2012
  • Firstpage
    469
  • Lastpage
    476
  • Abstract
    Checkpoint-recovery based on system virtualization is an attractive approach for providing the transparent and economic fault tolerance service in virtualized environments. The previous approaches introduce either great performance degradation or complex implementation issues. In this work, we propose a whole system hot-mirroring platform, namely Radiata, to provide fault-tolerance for any type of service by encapsulating the service instance into a virtual machine, and hot-mirroring the state changes of the virtual machine via the continual state replication. Our approach exploits three key optimizations for further reduction of the performance overhead: the asynchronous state replication, the COW-based memory checkpoint and the dirty page prediction. Based on the KVM platform, we have implemented the prototype system. The comprehensive evaluations under a variety of workloads demonstrate that Radiata is able to effectively support rapid and transparent fail-over in case of unexpected hardware failure, and outperforms the existing mechanisms in terms of the performance degradation in failure-free condition.
  • Keywords
    checkpointing; failure analysis; fault tolerance; optimisation; virtual machines; COW-based memory checkpoint; KVM; Radiata; asynchronous state replication; checkpoint recovery; continual state replication; economic fault tolerance service; failure free condition; hot mirroring platform; optimization; performance overhead; prototype system; transparent fault tolerance service; virtual environment; virtual machine; workload demonstration; Degradation; Fault tolerance; Fault tolerant systems; Hardware; Servers; Virtual machining; COW-base checkpoint; continual replication; fault tolerance; hot-mirroring; virtual machine;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE 14th International Conference on
  • Conference_Location
    Liverpool
  • Print_ISBN
    978-1-4673-2164-8
  • Type

    conf

  • DOI
    10.1109/HPCC.2012.70
  • Filename
    6332209