DocumentCode :
3042103
Title :
Progressive retry for software error recovery in distributed systems
Author :
Wang, Yi-Min ; Huang, Yennun ; Fuchs, Kent W.
Author_Institution :
Coordinated Sci. Lab., Illinois Univ., Urbana, IL, USA
fYear :
1993
fDate :
22-24 June 1993
Firstpage :
138
Lastpage :
144
Abstract :
A method of execution retry for bypassing software faults based on checkpointing, rollback, message reordering, and replaying is described. The authors demonstrate how rollback techniques, previously developed for transient hardware failure recovery, can also be used to recover from software errors by exploiting message reordering to bypass software faults. The approach intentionally increases the degree of nondeterminism and the scope of rollback when a previous retry fails. Examples from experience with telecommunications software systems illustrate the benefits of the scheme.
Keywords :
software fault tolerance; checkpointing; distributed systems; execution retry; message reordering; nondeterminism; progressive retry; rollback; software error recovery; telecommunications software systems; transient hardware failure recovery; Checkpointing; Computer errors; Contracts; Costs; Hardware; NASA; Protocols; Runtime; Software systems; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fault-Tolerant Computing, 1993. FTCS-23. Digest of Papers., The Twenty-Third International Symposium on
Conference_Location :
Toulouse, France
ISSN :
0731-3071
Print_ISBN :
0-8186-3680-7
Type :
conf
DOI :
10.1109/FTCS.1993.627317
Filename :
627317
Link To Document :
بازگشت