Title :
A communication-induced checkpointing protocol that ensures rollback-dependency trackability
Author :
Baldoni, R. ; Helary, J.-M. ; Mostefaoui, A. ; Raynal, M.
Author_Institution :
Rome Univ., Italy
Abstract :
Considering an application in which processes take local checkpoints independently (called basic checkpoints), this paper develops a protocol that forces them to take some additional local checkpoints (called forced checkpoints) in order that the resulting checkpoint and communication pattern satisfies the Rollback Dependency Trackability (RDT) property. This property states that all dependencies between local checkpoints are on-line trackable by using a transitive dependency vector. Compared to other protocols ensuring the RDT property, the proposed protocol is less conservative in the sense that it takes less additional local checkpoints. It attains this goal by a subtle tracking of causal dependencies on already taken checkpoints; this tracking is then used to prevent the occurrence of hidden dependencies. As indicated by simulation study, the proposed protocol compares favorably with other protocols; moreover it additionally associates on-the-fly with each local checkpoint C the minimum global checkpoint to which C belongs.
Keywords :
protocols; system recovery; causal dependencies; communication pattern; communication-induced checkpointing protocol; forced checkpoints; hidden dependencies; local checkpoints; rollback dependency trackability; rollback-dependency trackability; simulation study; transitive dependency vector; Checkpointing; Communication system control; Error correction; Force control; Protocols; Resumes; Software debugging; Sufficient conditions; System recovery; Terminology;
Conference_Titel :
Fault-Tolerant Computing, 1997. FTCS-27. Digest of Papers., Twenty-Seventh Annual International Symposium on
Conference_Location :
Seattle, WA, USA
Print_ISBN :
0-8186-7831-3
DOI :
10.1109/FTCS.1997.614079