• DocumentCode
    1244039
  • Title

    A distributed system-level diagnosis algorithm for arbitrary network topologies

  • Author

    Rangarajan, Sampath ; Dahbura, Anton T. ; Ziegler, Eric A.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Northeastern Univ., Boston, MA, USA
  • Volume
    44
  • Issue
    2
  • fYear
    1995
  • fDate
    2/1/1995 12:00:00 AM
  • Firstpage
    312
  • Lastpage
    334
  • Abstract
    A distributed algorithm is described for detecting and diagnosing faulty processors in an arbitrary network. Fault free processors perform simple periodic tests on one another; when a fault is detected or a newly repaired processor joins the network, this new information is disseminated in parallel throughout the network. It is formally proven that the algorithm is correct, and it is also shown that the algorithm is optimal in terms of the time required for all of the fault free processors in the network to learn of a new event. Simulation results are given for arbitrary network topologies
  • Keywords
    computer debugging; distributed algorithms; fault tolerant computing; program verification; reliability; algorithm correctness; arbitrary network topologies; distributed system-level diagnosis algorithm; fault free processors; faulty processors; periodic tests; Computer networks; Distributed algorithms; Distributed computing; Fault detection; Fault diagnosis; Military computing; Network topology; Performance evaluation; System testing; Workstations;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/12.364542
  • Filename
    364542