DocumentCode :
2482963
Title :
Checkpoints-on-demand with active replication
Author :
Rangarajan, Sampath ; Garg, Sachin ; Huang, Yennun
Author_Institution :
Lucent Technol., Bell Labs., Murray Hill, NJ, USA
fYear :
1998
fDate :
20-23 Oct 1998
Firstpage :
75
Lastpage :
83
Abstract :
Checkpointing and roll-back recovery is a well known technique for recovering from software process failures. Analytical models have been developed for computing the completion time of processes that use various checkpointing strategies such as periodic checkpointing, random checkpointing etc. In this paper, we show that with active replication of processes, a strategy that uses a mechanism we call checkpoints-on-demand will result in an expected completion time smaller than that can be achieved with traditional schemes that use periodic checkpoints. With checkpoints-on-demand, when a process fails, it is recovered from an induced checkpoint taken of a replica of the process. Recovery of persistent server processes through state-transfer from a replica has been proposed in the context of group communication systems and in the process cloning approach of the Delta-4 architecture. But it has not been previously proposed and analyzed as a mechanism for reducing the expected completion time of a long running process
Keywords :
client-server systems; software fault tolerance; system recovery; Delta-4 architecture; active replication; checkpoints-on-demand; completion time; group communication systems; periodic checkpointing; persistent server processes; process cloning approach; random checkpointing; roll-back recovery; software process failures; Analytical models; Checkpointing; Cloning; Context; Failure analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Reliable Distributed Systems, 1998. Proceedings. Seventeenth IEEE Symposium on
Conference_Location :
West Lafayette, IN
ISSN :
1060-9857
Print_ISBN :
0-8186-9218-9
Type :
conf
DOI :
10.1109/RELDIS.1998.740477
Filename :
740477
Link To Document :
بازگشت