Title :
Hardware-supported asynchronous checkpointing scheme
Author :
Chiu, J.-F. ; Chiu, G.-M.
Author_Institution :
Dept. of Electr. Eng. & Technol., Nat. Taiwan Univ. of Sci. & Technol., Taipei, Taiwan
fDate :
3/1/1998 12:00:00 AM
Abstract :
The authors propose a hardware-supported scheme to facilitate fast checkpointing and failure recovery operations. The mechanism uses a small-sized bank of nonvolatile memory to save an incremental checkpoint for a processor so that the time overhead incurred by checkpointing can be reduced. Parity technique is employed to compress checkpointing information. An important feature of our scheme is that the checkpointing operation is dissociated from the parity update action. As a result, checkpointing latency is not affected by the speed of parity update activities, and thus is reduced. Moreover. It does not require atomic action for updating the parity data. Furthermore, our scheme allows each processor to initiate a checkpoint independently of others. Experimental results show that the overhead of our mechanism is small, and is not sensitive to the number of checkpoints taken by the processors. This observation suggests that the proposed hardware-supported scheme is promising for improving the performance of checkpoint/rollback-recovery systems
Keywords :
fault tolerant computing; multiprocessing systems; parallel architectures; system recovery; asynchronous checkpointing; checkpointing; failure recovery; fault tolerance; multicomputer system; nonvolatile memory; rollback-recovery;
Journal_Title :
Computers and Digital Techniques, IEE Proceedings -
DOI :
10.1049/ip-cdt:19981908