Title :
A distributed consistent global checkpoint algorithm with a minimum number of checkpoints
Author :
Manabe, Yoshifumi
Author_Institution :
NTT Basic Res. Labs., Kanagawa, Japan
Abstract :
A distributed coordinated checkpointing algorithm is shown. A consistent global checkpoint is a set of states in which no message is recorded as received in one process and as not yet sent in another process. This algorithm obtains a consistent global checkpoint for any checkpoint initiation by any process. Under Chandy and Lamport´s assumption that one consistent global checkpoint is obtained for a set of concurrent checkpoint initiations, the total number of checkpoints is minimized. This paper then modifies the assumption in order to reduce the number of checkpoints further
Keywords :
distributed algorithms; program diagnostics; checkpoint initiation; distributed coordinated checkpointing; distributed system; global checkpoint; global checkpoint algorithm; Checkpointing; Delay; Laboratories;
Conference_Titel :
Information Networking, 1998. (ICOIN-12) Proceedings., Twelfth International Conference on
Conference_Location :
Tokyo
Print_ISBN :
0-8186-7225-0
DOI :
10.1109/ICOIN.1998.648445