• DocumentCode
    1381164
  • Title

    Damage assessment for optimal rollback recovery

  • Author

    Lin, Tein-hsiang ; Shin, Kang G.

  • Author_Institution
    Microtec Graphics, Santa Clara, CA, USA
  • Volume
    47
  • Issue
    5
  • fYear
    1998
  • fDate
    5/1/1998 12:00:00 AM
  • Firstpage
    603
  • Lastpage
    613
  • Abstract
    Conventional schemes of rollback recovery with checkpointing for concurrent processes have overlooked an important problem: contamination of checkpoints as a result of error propagation among the cooperating processes. Error propagation is unavoidable due to imperfect detection mechanisms and random interprocess communications, and it could give rise to contaminated checkpoints which, in turn, result in unsuccessful rollbacks. To counter the problem of error propagation, a damage assessment model is developed to estimate the correctness of saved checkpoints under various circumstances. Using the result of damage assessment, determination of the “optimal” checkpoints for rollback recovery-which minimize the average total recovery overhead-is formulated and solved as a nonlinear integer programming problem. Integration of damage assessment into existing recovery schemes is also discussed
  • Keywords
    integer programming; parallel programming; system recovery; checkpointing; concurrent processes; cooperating processes; error propagation; imperfect detection mechanisms; integer programming; random interprocess communications; rollback recovery; Checkpointing; Computer Society; Computer errors; Contamination; Counting circuits; Error correction; Linear programming; Protocols; Resumes; Secure storage;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/12.677255
  • Filename
    677255