Title :
A communication-induced checkpointing algorithm using virtual checkpoint on distributed systems
Author :
Do-Hyung, Kim ; Chang-Soon, Park
Author_Institution :
Electron. & Telecommun. Res. Inst., South Korea
Abstract :
Checkpointing is a fault-tolerant technique for restoring faults and restarting jobs quickly. The algorithms for checkpointing on distributed systems have been under study for years. These algorithms can be classified into three types: coordinated, uncoordinated and communication-induced algorithms. In this paper we propose a new communication-induced checkpointing algorithm that has a minimum checkpointing count equivalent to the periodic checkpointing algorithm, and relatively short rollback distance at fault situations. The proposed algorithm is compared with the previously proposed communication-induced checkpointing algorithms with simulation results. In the simulation, the proposed algorithm produces better performance than other algorithms in terms of task completion time in both fault-free and fault situations
Keywords :
distributed processing; fault tolerant computing; system recovery; virtual machines; communication-induced checkpointing algorithm; distributed systems; rollback distance; simulation; task completion time; virtual checkpoint; Checkpointing; Communication system control; Degradation; Fault tolerant systems; Force control; Hardware; Terminology;
Conference_Titel :
Parallel and Distributed Systems, 2000. Proceedings. Seventh International Conference on
Conference_Location :
Iwate
Print_ISBN :
0-7695-0568-6
DOI :
10.1109/ICPADS.2000.857693