DocumentCode :
1417612
Title :
A unified approach to fault-tolerance in communication protocols based on recovery procedures
Author :
Agarwal, Anjali ; Atwood, J. William
Author_Institution :
Dept. of Electr. & Comput. Eng., Concordia Univ., Montreal, Que., Canada
Volume :
4
Issue :
5
fYear :
1996
fDate :
10/1/1996 12:00:00 AM
Firstpage :
785
Lastpage :
795
Abstract :
Discusses fault tolerance in computer communication protocols, modeled by communicating finite state machines, by providing an efficient algorithmic procedure for recovery in such systems. Even when the communication network is reliable and maintains the order of messages, any kind of transient error that may not be detected immediately could contaminate the system, resulting in protocol failure. To achieve fault-tolerance, the protocol must be able to detect the error, and then it must recover from that error and eventually reach a legal (or consistent) state, and resume its normal execution. A protocol that possesses the latter feature of recovering and continuing its execution starting from a legal state is also called a self-stabilizing protocol. Our recovery procedure does not require the application of an intrusive checkpointing procedure. The stable storage requirement for each process is less than that required for other proposed recovery procedures. The recovery procedure provides us with a legal protocol state, which is the global state before reaching any illegal state and before the effects of the error make other states illegal. Only a minimal number of processes affected by error propagation are required to rollback. Our recovery procedure can be used to recover from any number of transient errors in the system. Our recovery procedure has also been modeled in PROMELA, a language to describe validation models, which shows the syntactic correctness of our recovery protocol design. Finally, our procedure is compared with the existing approaches of handing the errors, and an illustrative example is provided
Keywords :
computer network reliability; finite state machines; protocols; software fault tolerance; telecommunication computing; PROMELA; algorithmic procedure; computer communication protocols; error propagation; fault-tolerance; finite state machines; legal state; protocol failure; recovery procedures; self-stabilizing protocol; stable storage requirement; syntactic correctness; transient error; Automata; Communication networks; Computer network reliability; Fault tolerance; Fault tolerant systems; Law; Legal factors; Maintenance; Protocols; Telecommunication network reliability;
fLanguage :
English
Journal_Title :
Networking, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1063-6692
Type :
jour
DOI :
10.1109/90.541326
Filename :
541326
Link To Document :
بازگشت