DocumentCode :
772119
Title :
On properties of RDT communication-induced checkpointing protocols
Author :
Tsai, Jichiang
Author_Institution :
Dept. of Electr. Eng., Nat. Chung-Hsing Univ., Taichung, Taiwan
Volume :
14
Issue :
8
fYear :
2003
Firstpage :
755
Lastpage :
764
Abstract :
Rollback-dependency trackability (RDT) is a property stating that all rollback dependencies between local checkpoints are online trackable by using a transitive dependency vector. The most crucial RDT characterizations introduced in the literature can be represented as certain types of RDT-PXCM-paths. Here, let the U-path and V-path be any two types of RDT-PXCM-paths. We investigate several properties of communication-induced checkpointing protocols that ensure the RDT property. First, we prove that if an online RDT protocol encounters a U-path at a point of a checkpoint and communication pattern associated with a distributed computation, it also encounters a V-path there. Moreover, if this encountered U-path is invisibly doubled, the corresponding encountered V-path is invisibly doubled as well. Therefore, we can conclude that breaking all invisibly doubled U-paths is equivalent to breaking all invisibly doubled V-paths for an online RDT protocol. Next, we continue to demonstrate that a visibly doubled U-path must contain a doubled U-cycle in the causal past. These results can further deduce that some different checkpointing protocols actually have the same behavior for all possible patterns. Finally, we present a commendatory systematic technique for comparing the performance of online RDT protocols.
Keywords :
fault tolerant computing; message passing; multiprocessing systems; protocols; system recovery; RDT property; RDT-PXCM-path; commendatory systematic technique; communication-induced checkpointing protocol; distributed computation; distributed system; fault tolerance; local checkpoint; online RDT protocol; rollback-dependency trackability; rollback-recovery; transitive dependency vector; Checkpointing; Communication networks; Communication system control; Computer networks; Distributed computing; Fault tolerant systems; Force control; Nonvolatile memory; Process control; Protocols;
fLanguage :
English
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9219
Type :
jour
DOI :
10.1109/TPDS.2003.1225055
Filename :
1225055
Link To Document :
بازگشت