• DocumentCode
    3321807
  • Title

    An efficient checkpointing algorithm for distributed systems implementing reliable communication channels

  • Author

    Gendelman, Eugene ; Bic, Lubomir F. ; Dillencourt, Michael B.

  • Author_Institution
    Dept. of Inf. & Comput. Sci., California Univ., Irvine, CA, USA
  • fYear
    1999
  • fDate
    1999
  • Firstpage
    290
  • Lastpage
    291
  • Abstract
    This paper presents a new checkpointing algorithm that guarantees the semantics of reliable communication channels despite the crash and recovery of processes. This algorithm requires O(n+m) communication messages, where n is the number of participating processes, and m is the number of “late” messages. The algorithm is nonblocking, requires minimal message logging, and has minimal stable storage requirements. This algorithm is also scalable, simple transparent to the user, and facilitates fast recovery. By introducing suitable delay in the checkpointing process, the parameter m can be made small. We also describe a variant of the algorithm that requires only O(n) messages, at a cost of O(n) additional storage for each process
  • Keywords
    delays; distributed processing; software fault tolerance; system recovery; checkpointing algorithm; communication messages; delay; distributed systems; message logging; nonblocking algorithm; reliable communication channels; semantics; stable storage requirements; system crash; system recovery; Checkpointing; Clocks; Communication channels; Computer science; Costs; Identity-based encryption; Message passing; Protocols; Synchronization; TCPIP;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Reliable Distributed Systems, 1999. Proceedings of the 18th IEEE Symposium on
  • Conference_Location
    Lausanne
  • ISSN
    1060-9857
  • Print_ISBN
    0-7695-0290-3
  • Type

    conf

  • DOI
    10.1109/RELDIS.1999.805105
  • Filename
    805105