Title :
Fault tolerance of message delivery with cascading copies
Author :
Al-Jaber, H. ; Rotenstreich, S.
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., George Washington Univ., Washington, DC, USA
Abstract :
The authors present a fault-tolerance algorithm that guarantees the delivery of a message to its destination despite faults in one or more nodes in a system of loosely coupled processors. This algorithm is distinguished by not using the extra hardware or checkpoint facilities that are common to many algorithms of its type. Instead, it maintains an appropriate number of copies of the message in the nodes through which the message passes. In the case of a fault, the algorithm locates a copy of the message closest to the destination and resumes delivery of the message from this location. Failure detection and recovery are automatic and transparent to the users. The algorithm can be implemented on diskless systems, such as specialized real-time systems or parallel processing systems that use interconnection networks (e.g. a hypercube)
Keywords :
fault tolerant computing; cascading copies; diskless systems; failure detection; fault-tolerance algorithm; hypercube; interconnection networks; message delivery; parallel processing systems; recovery; specialized real-time systems; Distributed computing; Fault tolerance; Fault tolerant systems; Hardware; Message passing; Message systems; Multiprocessor interconnection networks; Parallel processing; Real time systems; Resumes;
Conference_Titel :
Databases, Parallel Architectures and Their Applications,. PARBASE-90, International Conference on
Conference_Location :
Miami Beach, FL
Print_ISBN :
0-8186-2035-8
DOI :
10.1109/PARBSE.1990.77144