Title :
A distributed system-level diagnosis algorithm for arbitrary network topologies
Author :
Rangarajan, Sampath ; Dahbura, Anton T. ; Ziegler, Eric A.
Author_Institution :
Dept. of Electr. & Comput. Eng., Northeastern Univ., Boston, MA, USA
fDate :
2/1/1995 12:00:00 AM
Abstract :
A distributed algorithm is described for detecting and diagnosing faulty processors in an arbitrary network. Fault free processors perform simple periodic tests on one another; when a fault is detected or a newly repaired processor joins the network, this new information is disseminated in parallel throughout the network. It is formally proven that the algorithm is correct, and it is also shown that the algorithm is optimal in terms of the time required for all of the fault free processors in the network to learn of a new event. Simulation results are given for arbitrary network topologies
Keywords :
computer debugging; distributed algorithms; fault tolerant computing; program verification; reliability; algorithm correctness; arbitrary network topologies; distributed system-level diagnosis algorithm; fault free processors; faulty processors; periodic tests; Computer networks; Distributed algorithms; Distributed computing; Fault detection; Fault diagnosis; Military computing; Network topology; Performance evaluation; System testing; Workstations;
Journal_Title :
Computers, IEEE Transactions on