Title :
A low overhead checkpointing and rollback recovery scheme for distributed systems
Author :
Tong, Zhijun ; Kain, Richard Y. ; Tsai, W.T.
Author_Institution :
Dept. of Electr. Eng., Minnesota Univ., Minneapolis, MN, USA
Abstract :
A major obstacle in implementing a rollback recovery scheme for fault tolerance in a concurrent distributed system is the domino effect. A low overhead checkpointing scheme is proposed to prevent this effect. Each process saves its state periodically. The state-save synchronization among processes is implemented by bounding clock drifts. A communication protocol that assures that all saved states are consistent is developed
Keywords :
distributed processing; fault tolerant computing; network operating systems; protocols; system recovery; bounding clock drifts; communication protocol; concurrent distributed system; distributed systems; domino effect; fault tolerance; low overhead checkpointing; rollback recovery scheme; saved states; Checkpointing; Clocks; Computer science; Distributed computing; Fault detection; Fault tolerant systems; Power system reliability; Protocols; Radio access networks; Synchronization;
Conference_Titel :
Reliable Distributed Systems, 1989., Proceedings of the Eighth Symposium on
Conference_Location :
Seattle, WA
Print_ISBN :
0-8186-1981-3
DOI :
10.1109/RELDIS.1989.72744