• DocumentCode
    579757
  • Title

    An OS-Hypervisor Infrastructure for Automated OS Crash Diagnosis and Recovery in a Virtualized Environment

  • Author

    Jann, Joefon ; Burugula, R. Sarma ; Wu, Ching-Farn E. ; El Maghraoui, Kaoutar

  • Author_Institution
    IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
  • fYear
    2012
  • fDate
    24-26 Oct. 2012
  • Firstpage
    195
  • Lastpage
    202
  • Abstract
    Recovering from OS crashes has traditionally been done using reboot or checkpoint-restart mechanisms. Such techniques either fail to preserve the state before the crash happens or require modifications to applications. To eliminate these problems, we present a novel OS-hyper visor infrastructure for automated OS crash diagnosis and recovery in virtual servers. Our approach uses a small hidden OS-repair-image that is dynamically created from the healthy running OS instance. Upon an OS crash, the hyper visor automatically loads this repair-image to perform diagnosis and repair. The offending process is then quarantined, and the fixed OS automatically resumes running without a reboot. Our experimental evaluations demonstrated that it takes less than 3 seconds to recover from an OS crash. This approach can significantly reduce the downtime and maintenance costs in data centers. This is the first design and implementation of an OS-hyper visor combo capable of automatically resurrecting a crashed commercial server-OS.
  • Keywords
    checkpointing; computer centres; operating systems (computers); software maintenance; software reliability; OS-hyper visor infrastructure; OS-hypervisor infrastructure; automated OS crash diagnosis; automated OS crash recovery; checkpoint-restart mechanism; data center; hidden OS-repair-image; maintenance cost; reboot; reliability; virtual server; virtualized environment; Computer crashes; Data structures; Hardware; Kernel; Maintenance engineering; Registers; Virtual machine monitors; Availability; Computer Crash; Operating Systems; Reliability; System Recovery;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Architecture and High Performance Computing (SBAC-PAD), 2012 IEEE 24th International Symposium on
  • Conference_Location
    New York, NY
  • ISSN
    1550-6533
  • Print_ISBN
    978-1-4673-4790-7
  • Type

    conf

  • DOI
    10.1109/SBAC-PAD.2012.10
  • Filename
    6374789