DocumentCode :
1510214
Title :
Analysis and evaluation of distributed checkpoint algorithms to avoid rollback propagation
Author :
Zambonelli, F.
Author_Institution :
Dipt. di Sci. dell´´Ingegneria, Modena Univ., Italy
Volume :
145
Issue :
6
fYear :
1998
fDate :
12/1/1998 12:00:00 AM
Firstpage :
212
Lastpage :
218
Abstract :
Checkpointing is a very well known mechanism to achieve fault tolerance. In distributed applications where processes can checkpoint independently of each other, a local checkpoint is useful for fault tolerance purposes only if it belongs to at least one consistent global checkpoint. In this case, execution can be restarted from it without needing to rollback the execution in the past. The paper exploits a theoretical framework that facilitates the definition and analysis of distributed checkpoint algorithms to avoid rollback propagation. Several distributed algorithms are presented which avoid rollback propagation by forcing additional checkpoints in processes. The effectiveness of the algorithms is evaluated in several testbed applications, showing their limited capability of bounding the number of additional checkpoints
Keywords :
distributed algorithms; software fault tolerance; system recovery; consistent global checkpoint; distributed applications; distributed checkpoint algorithms; fault tolerance; local checkpoint; rollback propagation; theoretical framework;
fLanguage :
English
Journal_Title :
Software, IEE Proceedings -
Publisher :
iet
ISSN :
1462-5970
Type :
jour
DOI :
10.1049/ip-sen:19982442
Filename :
765680
Link To Document :
بازگشت