• DocumentCode
    2130058
  • Title

    How to recover efficiently and asynchronously when optimism fails

  • Author

    Damani, Om P. ; Garg, Vijay K.

  • Author_Institution
    Dept. of Comput. Sci., Texas Univ., Austin, TX, USA
  • fYear
    1996
  • fDate
    27-30 May 1996
  • Firstpage
    108
  • Lastpage
    115
  • Abstract
    We propose a new algorithm for recovering asynchronously from failures in a distributed computation. Our algorithm is based on two novel concepts-a fault-tolerant vector clock to maintain causality information in spite of failures, and a history mechanism to detect orphan states and obsolete messages. These two mechanisms together with checkpointing and message-logging are used to restore the system to a consistent state after a failure of one or more processes. Our algorithm is completely asynchronous. It handles multiple failures, does not assume any message ordering, causes the minimum amount of rollback and restores the maximum recoverable state with low overhead. Earlier optimistic protocols lack one or more of the above properties
  • Keywords
    distributed processing; fault tolerant computing; system recovery; causality information; checkpointing; distributed computation; fault-tolerant vector clock; history mechanism; maximum recoverable state; message-logging; multiple failures; optimistic protocols; recovering asynchronously; rollback; Checkpointing; Clocks; Costs; Distributed computing; Fault detection; Fault tolerance; History; Protocols;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Distributed Computing Systems, 1996., Proceedings of the 16th International Conference on
  • Print_ISBN
    0-8186-7399-0
  • Type

    conf

  • DOI
    10.1109/ICDCS.1996.507907
  • Filename
    507907