Title :
Finding consistent global checkpoints in a distributed computation
Author :
Manivannan, D. ; Netzer, Robert H B ; Singhal, Mukesh
Author_Institution :
Dept. of Comput. & Inf. Sci., Ohio State Univ., Columbus, OH, USA
fDate :
6/1/1997 12:00:00 AM
Abstract :
Consistent global checkpoints have many uses in distributed computations. A central question in applications that use consistent global checkpoints is to determine whether a consistent global checkpoint that includes a given set of local checkpoints can exist. Netzer and Xu (1995) presented the necessary and sufficient conditions under which such a consistent global checkpoint can exist, but they did not explore what checkpoints could be constructed. In this paper, we prove exactly which local checkpoints can be used for constructing such consistent global checkpoints. We illustrate the use of our results with a simple and elegant algorithm to enumerate all such consistent global checkpoints
Keywords :
computer network reliability; distributed processing; system recovery; causality; consistent global checkpoints; distributed checkpointing; distributed computation; failure recovery; fault tolerance; local checkpoints; Buildings; Checkpointing; Condition monitoring; Debugging; Distributed computing; Fault tolerant systems; Information science; Protocols; Sufficient conditions; System recovery;
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on