DocumentCode :
579757
Title :
An OS-Hypervisor Infrastructure for Automated OS Crash Diagnosis and Recovery in a Virtualized Environment
Author :
Jann, Joefon ; Burugula, R. Sarma ; Wu, Ching-Farn E. ; El Maghraoui, Kaoutar
Author_Institution :
IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
fYear :
2012
fDate :
24-26 Oct. 2012
Firstpage :
195
Lastpage :
202
Abstract :
Recovering from OS crashes has traditionally been done using reboot or checkpoint-restart mechanisms. Such techniques either fail to preserve the state before the crash happens or require modifications to applications. To eliminate these problems, we present a novel OS-hyper visor infrastructure for automated OS crash diagnosis and recovery in virtual servers. Our approach uses a small hidden OS-repair-image that is dynamically created from the healthy running OS instance. Upon an OS crash, the hyper visor automatically loads this repair-image to perform diagnosis and repair. The offending process is then quarantined, and the fixed OS automatically resumes running without a reboot. Our experimental evaluations demonstrated that it takes less than 3 seconds to recover from an OS crash. This approach can significantly reduce the downtime and maintenance costs in data centers. This is the first design and implementation of an OS-hyper visor combo capable of automatically resurrecting a crashed commercial server-OS.
Keywords :
checkpointing; computer centres; operating systems (computers); software maintenance; software reliability; OS-hyper visor infrastructure; OS-hypervisor infrastructure; automated OS crash diagnosis; automated OS crash recovery; checkpoint-restart mechanism; data center; hidden OS-repair-image; maintenance cost; reboot; reliability; virtual server; virtualized environment; Computer crashes; Data structures; Hardware; Kernel; Maintenance engineering; Registers; Virtual machine monitors; Availability; Computer Crash; Operating Systems; Reliability; System Recovery;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Architecture and High Performance Computing (SBAC-PAD), 2012 IEEE 24th International Symposium on
Conference_Location :
New York, NY
ISSN :
1550-6533
Print_ISBN :
978-1-4673-4790-7
Type :
conf
DOI :
10.1109/SBAC-PAD.2012.10
Filename :
6374789
Link To Document :
بازگشت