Title :
On staggered checkpointing
Author :
Vaidya, Nitin H.
Author_Institution :
Dept. of Comput. Sci., Texas A&M Univ., College Station, TX, USA
Abstract :
A consistent checkpointing algorithm serves a consistent view of a distributed application´s state on stable storage. The traditional consistent checkpointing algorithms require different processes to save their state at about the same time. This causes contention for the stable storage, potentially resulting in large overheads. Staggering the checkpoints taken by various processes can reduce checkpoint overhead. The paper presents a simple approach to arbitrarily stagger the checkpoints. The approach requires that the processes take consistent logical checkpoints, as compared to consistent physical checkpoints enforced by existing algorithms. Experimental results on nCube-2 are presented
Keywords :
distributed algorithms; distributed memory systems; fault tolerant computing; hypercube networks; reliability; system recovery; checkpoint overhead reduction; consistent checkpointing algorithm; consistent logical checkpoints; distributed application state; nCube-2; stable storage; staggered checkpointing; Checkpointing; Communication system control; Computer science; Degradation; Delay; Frequency; Upper bound;
Conference_Titel :
Parallel and Distributed Processing, 1996., Eighth IEEE Symposium on
Conference_Location :
New Orleans, LA
Print_ISBN :
0-8186-7683-3
DOI :
10.1109/SPDP.1996.570386