DocumentCode
3423475
Title
A timestamp-based checkpointing protocol for long-lived distributed computations
Author
Cristian, Flaviu ; Jahanian, Farnam
Author_Institution
IBM Almaden Res. Center, San Jose, CA, USA
fYear
1991
fDate
30 Sep-2 Oct 1991
Firstpage
12
Lastpage
20
Abstract
The authors present a timestamp-based protocol for checkpointing the global state of a long-lived distributed computation in an environment in which processor clocks are approximately synchronized. The protocol is based on periodic checkpointing of local process states and logging of incoming messages during a short bounded interval. It tolerates process crash and performance failures as well as network omission and performance failures. The proposed approach has the advantage of optimistic logging protocols in that it does not require synchronous logging of each message on stable storage. The approach also has the advantage of pessimistic logging protocols in that it avoids the domino effect by recovering to the most recent successful local checkpoint
Keywords
performance evaluation; protocols; domino effect; long-lived distributed computations; optimistic logging protocols; performance failures; process crash; processor clocks; timestamp-based checkpointing protocol; Checkpointing; Clocks; Communication channels; Computer crashes; Degradation; Distributed computing; Hardware; Protocols; Synchronization; System recovery;
fLanguage
English
Publisher
ieee
Conference_Titel
Reliable Distributed Systems, 1991. Proceedings., Tenth Symposium on
Conference_Location
Pisa
Print_ISBN
0-8186-2260-1
Type
conf
DOI
10.1109/RELDIS.1991.145399
Filename
145399
Link To Document