• DocumentCode
    1303655
  • Title

    A hierarchical adaptive distributed system-level diagnosis algorithm

  • Author

    Duarte, Elias Procópio ; Nanya, Takashi

  • Author_Institution
    Dept. of Inf., Fed. Univ. of Parand, Curitiba, Brazil
  • Volume
    47
  • Issue
    1
  • fYear
    1998
  • fDate
    1/1/1998 12:00:00 AM
  • Firstpage
    34
  • Lastpage
    45
  • Abstract
    Consider a system composed of N nodes that can be faulty or fault-free. The purpose of distributed system-level diagnosis is to have each fault-free node determine the state of all nodes of the system. This paper presents a Hierarchical Adaptive Distributed System-level Diagnosis (Hi-ADSD) algorithm, which is a fully distributed algorithm that allows every fault-free node to achieve diagnosis in, at most, (log 2 N)2 testing rounds. Nodes are mapped into progressively larger logical clusters, so that tests are run in a hierarchical fashion. Each node executes its tests independently of the other nodes, i.e., tests are run asynchronously. All the information that nodes exchange is diagnostic information. The algorithm assumes no link faults, a fully-connected network and imposes no bounds on the number of faults. Both the worst-case diagnosis latency and correctness of the algorithm are formally proved. As an example application, the algorithm was implemented on a 37-node Ethernet LAN, integrated to a network management system based on SNMP (Simple Network Management Protocol). Experimental results of fault and repair diagnosis are presented. This implementation by itself is also a significant contribution, for, although fault management is a key functional area of network management systems, currently deployed applications often implement only rudimentary diagnosis mechanisms. Furthermore, experimental results are given through simulation of the algorithm for large systems of 64 nodes and 512 nodes
  • Keywords
    computer network management; distributed algorithms; fault diagnosis; local area networks; Ethernet LAN; fully distributed algorithm; hierarchical adaptive distributed system-level diagnosis algorithm; logical clusters; worst-case diagnosis latency; Adaptive systems; Clustering algorithms; Delay; Distributed algorithms; Ethernet networks; Fault diagnosis; Local area networks; Logic testing; Protocols; System testing;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/12.656078
  • Filename
    656078