• DocumentCode
    3423475
  • Title

    A timestamp-based checkpointing protocol for long-lived distributed computations

  • Author

    Cristian, Flaviu ; Jahanian, Farnam

  • Author_Institution
    IBM Almaden Res. Center, San Jose, CA, USA
  • fYear
    1991
  • fDate
    30 Sep-2 Oct 1991
  • Firstpage
    12
  • Lastpage
    20
  • Abstract
    The authors present a timestamp-based protocol for checkpointing the global state of a long-lived distributed computation in an environment in which processor clocks are approximately synchronized. The protocol is based on periodic checkpointing of local process states and logging of incoming messages during a short bounded interval. It tolerates process crash and performance failures as well as network omission and performance failures. The proposed approach has the advantage of optimistic logging protocols in that it does not require synchronous logging of each message on stable storage. The approach also has the advantage of pessimistic logging protocols in that it avoids the domino effect by recovering to the most recent successful local checkpoint
  • Keywords
    performance evaluation; protocols; domino effect; long-lived distributed computations; optimistic logging protocols; performance failures; process crash; processor clocks; timestamp-based checkpointing protocol; Checkpointing; Clocks; Communication channels; Computer crashes; Degradation; Distributed computing; Hardware; Protocols; Synchronization; System recovery;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Reliable Distributed Systems, 1991. Proceedings., Tenth Symposium on
  • Conference_Location
    Pisa
  • Print_ISBN
    0-8186-2260-1
  • Type

    conf

  • DOI
    10.1109/RELDIS.1991.145399
  • Filename
    145399