• DocumentCode
    2875563
  • Title

    Monitoring Local Progress with Watchdog Timers Deduced from Global Properties

  • Author

    Barbosa, Raul

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Chalmers Univ. of Technol., Gothenburg, Sweden
  • fYear
    2010
  • fDate
    Oct. 31 2010-Nov. 3 2010
  • Firstpage
    131
  • Lastpage
    140
  • Abstract
    Distributed systems are used in numerous applications where failures can be costly. Due to concerns that some of the nodes may become faulty, critical services are usually replicated across several nodes, which execute distributed algorithms to ensure correct service in spite of failures. To prevent replica-exhaustion, it is fundamental to detect errors and trigger appropriate recovery actions. In particular, it is important to detect situations in which nodes cease to execute the intended algorithm, e.g., when a replica is compromised by an attacker or when a hardware fault causes the node to behave erratically. This paper proposes a method for monitoring the local execution of nodes using watchdog timers. The approach consists in deducing, from the global system properties, local states that must be visited periodically by nodes that execute the intended algorithm correctly. When a node fails to trigger a watchdog before the time limit, an appropriate response can be initiated. The approach is applied to a well-known Byzantine consensus algorithm. The algorithm is modeled in the Promela language and the Spin model checker is used to identify local states that must be visited periodically by correct nodes. Such states are suitable for online monitoring using watchdog timers.
  • Keywords
    security of data; Promela language; critical services; distributed algorithms; distributed systems; global properties; hardware fault causes; monitoring local progress; replica exhaustion; spin model checker; watchdog timers; Computational modeling; Fault tolerance; Lead; Monitoring; Process control; Protocols; Timing; distributed systems; fault tolerance; intrusion tolerance; model checking; online monitoring; watchdogs;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Reliable Distributed Systems, 2010 29th IEEE Symposium on
  • Conference_Location
    New Delhi
  • ISSN
    1060-9857
  • Print_ISBN
    978-0-7695-4250-8
  • Type

    conf

  • DOI
    10.1109/SRDS.2010.23
  • Filename
    5623387