DocumentCode
579757
Title
An OS-Hypervisor Infrastructure for Automated OS Crash Diagnosis and Recovery in a Virtualized Environment
Author
Jann, Joefon ; Burugula, R. Sarma ; Wu, Ching-Farn E. ; El Maghraoui, Kaoutar
Author_Institution
IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
fYear
2012
fDate
24-26 Oct. 2012
Firstpage
195
Lastpage
202
Abstract
Recovering from OS crashes has traditionally been done using reboot or checkpoint-restart mechanisms. Such techniques either fail to preserve the state before the crash happens or require modifications to applications. To eliminate these problems, we present a novel OS-hyper visor infrastructure for automated OS crash diagnosis and recovery in virtual servers. Our approach uses a small hidden OS-repair-image that is dynamically created from the healthy running OS instance. Upon an OS crash, the hyper visor automatically loads this repair-image to perform diagnosis and repair. The offending process is then quarantined, and the fixed OS automatically resumes running without a reboot. Our experimental evaluations demonstrated that it takes less than 3 seconds to recover from an OS crash. This approach can significantly reduce the downtime and maintenance costs in data centers. This is the first design and implementation of an OS-hyper visor combo capable of automatically resurrecting a crashed commercial server-OS.
Keywords
checkpointing; computer centres; operating systems (computers); software maintenance; software reliability; OS-hyper visor infrastructure; OS-hypervisor infrastructure; automated OS crash diagnosis; automated OS crash recovery; checkpoint-restart mechanism; data center; hidden OS-repair-image; maintenance cost; reboot; reliability; virtual server; virtualized environment; Computer crashes; Data structures; Hardware; Kernel; Maintenance engineering; Registers; Virtual machine monitors; Availability; Computer Crash; Operating Systems; Reliability; System Recovery;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Architecture and High Performance Computing (SBAC-PAD), 2012 IEEE 24th International Symposium on
Conference_Location
New York, NY
ISSN
1550-6533
Print_ISBN
978-1-4673-4790-7
Type
conf
DOI
10.1109/SBAC-PAD.2012.10
Filename
6374789
Link To Document