Title :
The performance of consistent checkpointing
Author :
Elnozahy, Elmootazbellah Nabil ; Johnson, David B. ; Zwaenepoel, Willy
Author_Institution :
Dept. of Comput. Sci., Rice Univ., Houston, TX, USA
Abstract :
Consistent checkpointing provides transparent fault tolerance for long-running distributed applications. Performance measurements of an implementation of consistent checkpointing are described. The measurements show that consistent checkpointing performs remarkably well. Eight computation-intensive distributed applications were executed on a network of 16 diskless Sun-3/60 workstations, and the performance without checkpointing was compared to the performance with consistent checkpoints taken at two-minute intervals. For six of the eight applications, the running time increased by less than 1% as a result of the checkpointing. The highest overhead measured was 5.8%. Incremental checkpointing and copy-on write checkpointing were the most effective techniques in lowering the running time overhead. It is argued that these measurements show that consistent checkpointing is an efficient way to provide fault tolerance for long-running distributed applications
Keywords :
data integrity; distributed processing; fault tolerant computing; computation-intensive distributed applications; consistent checkpointing; copy-on write checkpointing; diskless Sun-3/60 workstations; long-running distributed applications; transparent fault tolerance; Application software; Checkpointing; Computer science; Costs; Fault tolerance; File servers; Magnetic heads; Measurement; Performance evaluation; Workstations;
Conference_Titel :
Reliable Distributed Systems, 1992. Proceedings., 11th Symposium on
Conference_Location :
Houston, TX
Print_ISBN :
0-8186-2890-1
DOI :
10.1109/RELDIS.1992.235144