Title :
Can Linux be Rejuvenated without Reboots?
Author :
Yoshimura, Takeshi ; Yamada, Hiroshi ; Kono, Kenji
Author_Institution :
Keio Univ., Yokohama, Japan
fDate :
Nov. 29 2011-Dec. 2 2011
Abstract :
Operating systems (OSes) are crucial for achieving high availability of computer systems. Even if the applications running on the operating system are highly available, a bug inside the kernel may result in a failure of the entire software stack. Rejuvenating OSes is a promising approach to prevent and recover from transient errors. Unfortunately, OS rejuvenation takes a lot of time because we do not have any method other than rebooting the entire OS. In this paper we explore the possibility of rejuvenating Linux without reboots. In our previous research, we investigated the scope of error propagation in Linux. The propagation scope is process-local if the error is confined in the process context that activated it. The scope is kernel-global if the error propagates to other processes´ contexts or global data structures. If most errors are process- local, we can rejuvenate the Linux kernel without rebooting the entire kernel because the kernel goes back to a consistent and clean state simply by killing and revoking the resources of the faulting process. Our conclusion is that Linux can be rejuvenated without reboots with high probability. Linux is coded in a defensive way and thus, most of the manifested errors (96%) were process-local and only one error was kernel- global.
Keywords :
Linux; operating system kernels; Linux kernel; bug; computer system; global data structures; operating system; reboot; software stack; transient error propagation; Computer bugs; Context; Kernel; Linux; Fault Injection; Operating System Dependability; Rejuvenation; Scope of Error Propagation; Software Faults;
Conference_Titel :
Software Aging and Rejuvenation (WoSAR), 2011 IEEE Third International Workshop on
Conference_Location :
Hiroshima
Print_ISBN :
978-1-4673-0739-0
DOI :
10.1109/WoSAR.2011.12