• DocumentCode
    2483367
  • Title

    Consensus in asynchronous systems where processes can crash and recover

  • Author

    Hurfin, Michel ; Mostéfaoui, Achour ; Raynal, Michel

  • Author_Institution
    IRISA, Rennes, France
  • fYear
    1998
  • fDate
    20-23 Oct 1998
  • Firstpage
    280
  • Lastpage
    286
  • Abstract
    The consensus problem is now well identified as being one of the most important problems encountered in the design and the construction of fault-tolerant distributed systems. This problem is defined as follows: processes have to reach a common decision, which depends on their inputs, despite failures. We consider the consensus problem in asynchronous distributed systems augmented with unreliable failure detectors. Several protocols have been proposed for these systems, when process crashes are assumed to be definitive. This paper addresses the consensus problem in a more practical asynchronous system model, namely in a context where processes can crash and recover. As a process crash entails the loss of its volatile memory, each process is equipped with a stable storage. So, to be efficient a consensus protocol has to log as few critical data as possible. The proposed protocol uses a new class of failure detectors suited to the crash/recovery model. It is particularly efficient when, whether there are crashes or not, the underlying failure detector makes few mistakes. Additionally, the proposed protocol tolerates message duplication and copes with some message losses
  • Keywords
    open systems; protocols; software fault tolerance; system recovery; asynchronous systems; consensus problem; consensus protocol; crash/recovery model; critical data; fault-tolerant distributed systems; message duplication; process crash; protocols; stable storage; unreliable failure detectors; volatile memory; Ash; Broadcasting; Computer crashes; Context modeling; Detectors; Electrical capacitance tomography; Electronic switching systems; Fault diagnosis; Protocols;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Reliable Distributed Systems, 1998. Proceedings. Seventeenth IEEE Symposium on
  • Conference_Location
    West Lafayette, IN
  • ISSN
    1060-9857
  • Print_ISBN
    0-8186-9218-9
  • Type

    conf

  • DOI
    10.1109/RELDIS.1998.740510
  • Filename
    740510