Title :
Use of common time base for checkpointing and rollback recovery in a distributed system
Author :
Ramanathan, Parameswaran ; Shin, Kang G.
Author_Institution :
Dept. of Electr. & Comput. Eng., Wisconsin Univ., Madison, WI, USA
fDate :
6/1/1993 12:00:00 AM
Abstract :
An approach to checkpointing and rollback recovery in a distributed computing system using a common time base is proposed. A common time base is established in the system using a hardware clock synchronization algorithm. This common time base is coupled with the idea of pseudo-recovery points to develop a checkpointing algorithm that has the following advantages: reduced wait for commitment for establishing recovery lines, fewer messages to be exchanged, and less memory requirement. These advantages are assessed quantitatively by developing a probabilistic model
Keywords :
distributed processing; fault tolerant computing; system recovery; checkpointing; common time base; distributed system; hardware clock synchronization algorithm; memory requirement; message exchange; probabilistic model; pseudo-recovery points; recovery lines; rollback recovery; Checkpointing; Clocks; Distributed computing; Fault tolerant systems; Hardware; NASA; Real time systems; Resumes; Synchronization; Testing;
Journal_Title :
Software Engineering, IEEE Transactions on