DocumentCode :
3441907
Title :
Checkpoint and rollback in asynchronous distributed systems
Author :
Higaki, Hiroaki ; Shima, Kenji ; Tachikawa, Takayuki ; Takizawa, Makoto
Author_Institution :
Dept. of Comput. & Syst. Eng., Tokyo Denki Univ., Japan
Volume :
3
fYear :
1997
fDate :
7-12 Apr 1997
Firstpage :
998
Abstract :
This paper proposes a novel algorithm for taking checkpoints and rolling back the processes for recovery in asynchronous distributed systems. The algorithm has the following properties: (1) multiple processes can simultaneously initiate the checkpointing; (2) no additional message is transmitted for taking checkpoints; (3) a set of local checkpoints taken by multiple processes denotes a consistent global state; (4) multiple processes can initiate simultaneously the rollback recovery; (5) the minimum number of processes are rolled back; and (6) each process is rolled back asynchronously. The number of messages for rolling back the processes is O(l) where l is the number of channels. Therefore, the system is kept highly available by the algorithm presented
Keywords :
computer network reliability; distributed processing; algorithm; asynchronous distributed systems; channels; checkpoint; consistent global state; information systems; multiple processes; rollback recovery; Application software; Availability; Checkpointing; Distributed computing; Fault tolerant systems; Hardware; Information systems; Internet; Protocols; Systems engineering and theory;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
INFOCOM '97. Sixteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Driving the Information Revolution., Proceedings IEEE
Conference_Location :
Kobe
ISSN :
0743-166X
Print_ISBN :
0-8186-7780-5
Type :
conf
DOI :
10.1109/INFCOM.1997.631114
Filename :
631114
Link To Document :
بازگشت