DocumentCode :
2500327
Title :
Concurrent robust checkpointing and recovery in distributed systems
Author :
Leu, Pei-jyun ; Bhargava, Bharat
Author_Institution :
Dept. of Comput. Sci., Purdue Univ., West Lafayette, IN, USA
fYear :
1988
fDate :
1-5 Feb 1988
Firstpage :
154
Lastpage :
163
Abstract :
A checkpoint/rollback algorithm is presented for multiple processes in a distributed system that uses message passing for communication. Each process in the system can initiate the algorithm autonomously. If only one instance of the algorithm is being executed, the algorithm will force the minimal number of additional processes other than the initiator to make checkpoints (or roll back). The contributions of this research are as follows: (1) the concurrent execution of the algorithm for different global checkpointing instances and rollback instances initiated by several processes is allowed. Deadlocks or livelocks among different global checkpointing instances and rollback instances will not occur; (2) the algorithm is resilient to multiple process failures, and handles network partitioning in a pessimistic way, and (3) the algorithm does not require that messages be received in the order in which they are sent
Keywords :
distributed databases; concurrent robust checkpointing; distributed systems; message passing; multiple process failures; recovery; rollback algorithm; Checkpointing; Concurrent computing; Content addressable storage; Distributed computing; Interference; Merging; Message passing; NASA; Partitioning algorithms; Robustness;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 1988. Proceedings. Fourth International Conference on
Conference_Location :
Los Angeles, CA
Print_ISBN :
0-8186-0827-7
Type :
conf
DOI :
10.1109/ICDE.1988.105457
Filename :
105457
Link To Document :
بازگشت