• DocumentCode
    33570
  • Title

    VM-μCheckpoint: Design, Modeling, and Assessment of Lightweight In-Memory VM Checkpointing

  • Author

    Long Wang ; Kalbarczyk, Zbigniew ; Iyer, Ravishankar K. ; Iyengar, Arun

  • Author_Institution
    IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
  • Volume
    12
  • Issue
    2
  • fYear
    2015
  • fDate
    March-April 1 2015
  • Firstpage
    243
  • Lastpage
    255
  • Abstract
    Checkpointing and rollback techniques enhance reliability and availability of virtual machines and their hosted IT services. This paper proposes VM-μCheckpoint, a light-weight pure-software mechanism for high-frequency checkpointing and rapid recovery for VMs. Compared with existing techniques of VM checkpointing, VM-μCheckpoint tries to minimize checkpoint overhead and speed up recovery by means of copy-on-write, dirty-page prediction and in-place recovery, as well as saving incremental checkpoints in volatile memory. Moreover, VM-μCheckpoint deals with the issue that latency in error detection potentially results in corrupted checkpoints, particularly when checkpointing frequency is high. We also constructed Markov models to study the availability improvements provided by VM-μCheckpoint (from 99 to 99.98 percent on reasonably reliable hypervisors). We designed and implemented VM-μCheckpoint in the Xen VMM. The evaluation results demonstrate that VM-μCheckpoint incurs an average of 6.3 percent overhead (in terms of program execution time) for 50 ms checkpoint intervals when executing the SPEC CINT 2006 benchmark. Error injection experiments demonstrate that VM-μCheckpoint, combined with error detection techniques in RMK, provides high coverage of recovery.
  • Keywords
    Markov processes; checkpointing; error detection; virtual machines; Markov models; SPEC CINT 2006 benchmark; VM-μCheckpoint; Xen VMM; copy-on-write; dirty-page prediction; error detection; error injection experiments; in-place recovery; lightweight in-memory VM checkpointing; lighweight pure-software mechanism; rollback techniques; virtual machines; volatile memory; Availability; Checkpointing; Computer crashes; Pins; Transient analysis; Virtual machine monitors; Checkpoint corruption; checkpoint model; error latency; high-frequency checkpoint; incremental checkpoint; transient error;
  • fLanguage
    English
  • Journal_Title
    Dependable and Secure Computing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5971
  • Type

    jour

  • DOI
    10.1109/TDSC.2014.2327967
  • Filename
    6824750