DocumentCode :
2341717
Title :
Recursive restartability: turning the reboot sledgehammer into a scalpel
Author :
Candea, George ; Fox, Armando
Author_Institution :
Stanford Univ., CA, USA
fYear :
2001
fDate :
20-22 May 2001
Firstpage :
125
Lastpage :
130
Abstract :
Even after decades of software engineering research, complex computer systems still fail, primarily due to nondeterministic bugs that are typically resolved by rebooting. Conceding that Heisenbugs will remain a fact of life, we propose a systematic investigation of restarts as "high availability medicine." In this paper we show how recursive restartability (RR) - the ability of a system to gracefully tolerate restarts at multiple levels improves fault tolerance, reduces time-to-repair and enables system designers to build flexible, highly available software infrastructures. Using several examples of widely deployed software systems, we identify properties that are required of RR systems and outline an agenda for turning the recursive restartability philosophy into a practical software structuring tool. Finally, we describe infrastructural support for RR systems, along with initial ideas on how to analyze and benchmark such systems.
Keywords :
operating systems (computers); software fault tolerance; Heisenbugs; complex computer systems; fault tolerance; nondeterministic bugs; operating systems; recursive restartability; software engineering research; software structuring tool; time-to-repair; Availability; Computer bugs; Fault tolerant systems; Search engines; Software engineering; Software quality; Software systems; Software tools; System recovery; Turning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Hot Topics in Operating Systems, 2001. Proceedings of the Eighth Workshop on
Print_ISBN :
0-7695-1040-X
Type :
conf
DOI :
10.1109/HOTOS.2001.990072
Filename :
990072
Link To Document :
بازگشت