• DocumentCode
    1374563
  • Title

    Optimizing the Performance of Virtual Machine Synchronization for Fault Tolerance

  • Author

    Zhu, Jun ; Jiang, Zhefu ; Xiao, Zhen ; Li, Xiaoming

  • Author_Institution
    Sch. of Electron. Eng. & Comput. Sci., Peking Univ., Beijing, China
  • Volume
    60
  • Issue
    12
  • fYear
    2011
  • Firstpage
    1718
  • Lastpage
    1729
  • Abstract
    Hypervisor-based fault tolerance (HBFT), which synchronizes the state between the primary VM and the backup VM at a high frequency of tens to hundreds of milliseconds, is an emerging approach to sustaining mission-critical applications. Based on virtualization technology, HBFT provides an economic and transparent fault tolerant solution. However, the advantages currently come at the cost of substantial performance overhead during failure-free, especially for memory intensive applications. This paper presents an in-depth examination of HBFT and options to improve its performance. Based on the behavior of memory accesses among checkpointing epochs, we introduce two optimizations, read-fault reduction and write-fault prediction, for the memory tracking mechanism. These two optimizations improve the performance by 31 percent and 21 percent, respectively, for some applications. Then, we present software superpage which efficiently maps large memory regions between virtual machines (VM). Our optimization improves the performance of HBFT by a factor of 1.4 to 2.2 and achieves about 60 percent of that of the native VM.
  • Keywords
    checkpointing; fault tolerant computing; optimisation; synchronisation; virtual machines; virtual storage; virtualisation; HBFT; checkpointing epochs; fault tolerance; hypervisor based fault tolerance; memory access; memory tracking mechanism; optimization; read-fault reduction; virtual machine synchronization; virtualization technology; write-fault prediction; Computer architecture; Fault tolerance; Fault tolerant systems; Synchronization; Virtual machine monitors; Virtualization; checkpoint; fault tolerance.; hypervisor; recovery;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2010.224
  • Filename
    5629326