Title :
Optimistic failure recovery for very large networks
Author :
Lowry, Andy ; Russell, James R. ; Goldberg, Arthur P.
Author_Institution :
IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
fDate :
30 Sep-2 Oct 1991
Abstract :
Optimistic failure recovery mechanisms are proposed as a way to provide transparent fault tolerance to distributed applications and systems. The authors identify problems that may arise when these mechanisms are applied to vast networks including many processors and spanning large geographical areas and many administrative domains. They present a technique-recovery unit gateways-that can be used to address many of these issues with existing failure recovery algorithms. This method can be applied with minimal disruption to existing transparent recovery systems, as well as to build large optimistic recovery systems while minimizing the dependency tracking overhead
Keywords :
distributed processing; fault tolerant computing; administrative domains; dependency tracking overhead; distributed applications; distributed systems; optimistic failure recovery; recovery unit gateways; transparent fault tolerance; very large networks; Application software; Computer applications; Distributed computing; Distributed processing; Electronic mail; Fault tolerant systems; Local area networks; Memory management; Programming profession; System software;
Conference_Titel :
Reliable Distributed Systems, 1991. Proceedings., Tenth Symposium on
Conference_Location :
Pisa
Print_ISBN :
0-8186-2260-1
DOI :
10.1109/RELDIS.1991.145407