• DocumentCode
    327894
  • Title

    Distributed checkpoint algorithms to avoid roll-back propagation

  • Author

    Zambonelli, Franco

  • Author_Institution
    Dipt. di Sci. dell´´Ingegneria, Modena Univ., Italy
  • Volume
    1
  • fYear
    1998
  • fDate
    25-27 Aug 1998
  • Firstpage
    403
  • Abstract
    Checkpointing is a very well known mechanism to achieve fault tolerance. In distributed applications, a local checkpoint is useful for fault tolerance purposes only if can belong to at least one consistent global checkpoint and then, execution can be restarted from it without needing to roll back the execution in the past. The paper introduces a theoretical framework that facilitates the definition and the analysis of distributed checkpoint algorithms to avoid roll backpropagation. On this base, several algorithms are presented and evaluated in a set of testbed applications
  • Keywords
    distributed algorithms; message passing; software fault tolerance; checkpointing; consistent global checkpoint; distributed applications; distributed checkpoint algorithms; fault tolerance; local checkpoint; roll back propagation; roll backpropagation; testbed applications; theoretical framework; Algorithm design and analysis; Checkpointing; Computational modeling; Distributed computing; Fault tolerance; Force control; Process control; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Euromicro Conference, 1998. Proceedings. 24th
  • Conference_Location
    Vasteras
  • ISSN
    1089-6503
  • Print_ISBN
    0-8186-8646-4
  • Type

    conf

  • DOI
    10.1109/EURMIC.1998.711833
  • Filename
    711833