• DocumentCode
    2563729
  • Title

    Reconfiguration and transient recovery in state machine architectures

  • Author

    Rushby, John

  • Author_Institution
    Comput. Sci. Lab., SRI Int., Menlo Park, CA, USA
  • fYear
    1996
  • fDate
    25-27 Jun 1996
  • Firstpage
    6
  • Lastpage
    15
  • Abstract
    We consider an architecture for ultra-dependable operation based on synchronized state machine replication, extended to provide transient recovery and reconfiguration in the presence of arbitrary faults. The architecture allows processors suspected of being faulty to be placed on “probation.” Processors in this status cannot disrupt other processors, but those that are nonfaulty or recovering from transient faults are able to remain synchronized with the other processors and with each other, can participate in interactively consistent exchange of data (i.e., Byzantine agreement), and can restore damaged state data by loading majority-voted copies from other processors. The processors that are not on probation are able to coordinate membership of their group and to take processors on and off probation. These properties are achieved even if all the processors on probation and some of the others exhibit Byzantine faults, provided a majority of all processors are nonfaulty. Key elements of the architecture are modified treatments for the problems of interactive consistency, clock synchronization, and group membership. Classical algorithms for these problems that tolerate t Byzantine faults among n processors are extended to tolerate t+p faults among n+p processors, partitioned into n “core members” and p “probationers,” provided no more than t faults occur among the core members
  • Keywords
    automata theory; fault tolerant computing; multiprocessing systems; parallel algorithms; parallel architectures; reconfigurable architectures; synchronisation; system recovery; Byzantine agreement; Byzantine faults; arbitrary faults; clock synchronization; data exchange; group membership; interactive consistency; majority-voted copies; processor probation; state machine architectures; synchronization; synchronized state machine replication; system reconfiguration; transient recovery; Clocks; Computer architecture; Computer science; Contracts; Fault diagnosis; Laboratories; NASA; Redundancy; Synchronization; Voting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fault Tolerant Computing, 1996., Proceedings of Annual Symposium on
  • Conference_Location
    Sendai
  • ISSN
    0731-3071
  • Print_ISBN
    0-8186-7262-5
  • Type

    conf

  • DOI
    10.1109/FTCS.1996.534589
  • Filename
    534589