DocumentCode :
3556947
Title :
Crash recovery with little overhead
Author :
Juang, Tony T -Y ; Venkatesan, S.
Author_Institution :
Comput. Sci. Program, Texas Univ at Dallas, Richardson, TX, USA
fYear :
1991
fDate :
20-24 May 1991
Firstpage :
454
Lastpage :
461
Abstract :
Recovering from processor failures in distributed systems is an important problem in the design and development of reliable systems. Two solutions to this problem which involve very little overhead are presented. Without appending any information to the messages of the application program, it is shown that it is possible to recover from failures using O(|V| |E|) messages where |V| is the number of processors and |E| is the number of communication links in the system. The second algorithm can be used to recover from processor failures without forcing nonfaulty processors to roll back under certain conditions
Keywords :
fault tolerant computing; file organisation; operating systems (computers); system recovery; application program; communication links; crash recovery; distributed systems; nonfaulty processors; processor failures; reliable systems; Checkpointing; Computer crashes; Computer science; Delay; Fault tolerant systems; Hardware; History; IEL; Protocols;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Distributed Computing Systems, 1991., 11th International Conference on
Conference_Location :
Arlington, TX
Print_ISBN :
0-8186-2144-3
Type :
conf
DOI :
10.1109/ICDCS.1991.148709
Filename :
148709
Link To Document :
بازگشت