DocumentCode
1269972
Title
Effective and concurrent checkpointing and recovery in distributed systems
Author
Hou, C.J. ; Tsoi, K.S. ; Han, C.C.
Author_Institution
Dept. of Electr. Eng., Ohio State Univ., Columbus, OH, USA
Volume
144
Issue
5
fYear
1997
fDate
9/1/1997 12:00:00 AM
Firstpage
304
Lastpage
316
Abstract
The paper presents an effective application-transparent checkpointing/rollback scheme for multiple processes that communicate via message passing in a distributed system. The authors first propose a checkpointing scheme that uses the unforced checkpointing strategy and dynamically varies checkpoint intervals with respect to the frequency of message sending to reduce process rollback propagation. Additional forced checkpoints are taken only to achieve checkpoint consistency among processes and to avoid the domino effect. The authors then discuss both global rollback and minimal rollback approaches, and incorporate them into the proposed checkpointing scheme. The combined checkpointing/rollback scheme can handle out-of-order messages, achieve high concurrency during checkpointing/rollback operations, and allow multiple invocations of checkpointing/rollback instances. To reduce the space overhead a global recovery line determination approach to purge the checkpoints to which processes shall never is proposed. Experiences with event driven simulation indicate that the proposed scheme can effectively reduce rollback propagation, while incurring little control message overhead and maintaining at any time only a few checkpoints at each process
Keywords
concurrency control; message passing; system recovery; application-transparent checkpointing; concurrent checkpointing; control message overhead; distributed systems; domino effect; event driven simulation; global rollback; message passing; minimal rollback; out-of-order messages; process rollback propagation; recovery; rollback scheme;
fLanguage
English
Journal_Title
Computers and Digital Techniques, IEE Proceedings -
Publisher
iet
ISSN
1350-2387
Type
jour
DOI
10.1049/ip-cdt:19971527
Filename
627909
Link To Document