Title :
Checkpoint processing in distributed systems software using synchronized clocks
Author :
Neogy, S. ; Sinha, A. ; Das, P.K.
Author_Institution :
Dept. of Comput. Sci. Eng., Calcutta Univ., India
Abstract :
The method of taking checkpoints in a truly distributed manner, that is in the absence of a global checkpoint coordinator has been very tricky. This has been dealt with in a system that uses a loosely synchronized clock. The constituent processes take their checkpoints according to their own clocks at predetermined checkpoint instants. Since these checkpoints are asynchronous, in order to determine a global consistent set of such checkpoints there must be some sort of synchronization among them. Synchronization information is appended to clock synchronization messages that are used by the constituent processes for checkpoint-synchronization. Communication in this system is synchronous, so processes may be blocked for communication at the checkpointing instants. The blocked processes take their checkpoints after they unblock. It is shown that the set of such i-th checkpoints is consistent and hence the rollback required by the system in case failure occurs is only up to the last saved state
Keywords :
clocks; distributed programming; synchronisation; system recovery; blocked processes; checkpoint processing; checkpoint-synchronization; checkpointing instants; clock synchronization messages; distributed systems software; global consistent set; i-th checkpoints; last saved state; predetermined checkpoint instants; rollback; synchronized clocks; Algorithm design and analysis; Checkpointing; Clocks; Hardware; Message passing; Synchronization; System software;
Conference_Titel :
Information Technology: Coding and Computing, 2001. Proceedings. International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
0-7695-1062-0
DOI :
10.1109/ITCC.2001.918855