DocumentCode :
2302546
Title :
Using checkpoints to localize the effects of faults in distributed systems
Author :
Ahamad, Mustaque ; Lin, Luke
Author_Institution :
Sch. of Inf. & Comput. Sci., Georgia Inst. of Technol., Atlanta, GA, USA
fYear :
1989
fDate :
10-12 Oct 1989
Firstpage :
2
Lastpage :
11
Abstract :
A checkpointing scheme can be used to ensure forward progress of a computation (program) even when failures occur. In a distributed system, many autonomous programs can execute concurrently and obtain services from a set of shared servers. In such a system, it is desirable to to restrict a checkpoint or rollback operation to a single program to localize the effects of failures, even when processes of different programs communicate with servers. This can be achieved by a scheme based on message logging and consistent checkpoints when the system is deterministic. When the system (communication network or programs) is nondeterministic, the semantics of the server functions should be exploited to reduce the additional synchronization that needs to be introduced to ensure locality. The authors illustrate this by presenting efficient algorithms for a file server that do not require the logging of messages on stable storage
Keywords :
distributed processing; file servers; network operating systems; synchronisation; system recovery; autonomous programs; checkpointing scheme; distributed systems; file server; forward progress; message logging; rollback operation; semantics; server functions; shared servers; stable storage; synchronization; Checkpointing; Communication networks; Computer science; Concurrent computing; Costs; Distributed computing; File servers; Impedance; Network servers; Resumes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Reliable Distributed Systems, 1989., Proceedings of the Eighth Symposium on
Conference_Location :
Seattle, WA
Print_ISBN :
0-8186-1981-3
Type :
conf
DOI :
10.1109/RELDIS.1989.72743
Filename :
72743
Link To Document :
بازگشت