DocumentCode :
3010471
Title :
Mutual-Aid: Diskless Checkpointing Scheme for Tolerating Double Faults
Author :
Chiu, Jane-Ferng ; Hao, Wei-Hua
Author_Institution :
Dept. of Inf. Technol. & Commun., Tungnan Univ., Taipei
fYear :
2008
fDate :
25-27 Sept. 2008
Firstpage :
540
Lastpage :
547
Abstract :
Tolerating double faults is an important issue for diskless checkpointing due to the size and increase of the executing time. This is why Mutual-Aid checkpointing has become the first scheme to achieve the goal. Mutual-aid checkpointing combines the advantages of neighbor-based, with parity-based diskless approaches. This also tolerates all double processor faults by bitwising exclusive-or snapshots from its neighbor processors in its virtual assistant ring. In view of the fact that checkpointing and recovery of mutual-aid are so simple and efficient, this increases the performance, reduces application running time, and allows more frequent checkpoints. Moreover, it could be employed towards a very largescale and high performance computing field because of its distributed methods as well as localized operations. The degree of fault tolerance has achieved higher success than other schemes.
Keywords :
checkpointing; fault tolerant computing; parallel processing; diskless checkpointing scheme; double processor fault tolerance; high performance computing; mutual-aid checkpointing; system recovery; virtual assistant ring; Application software; Checkpointing; Concurrent computing; Fault tolerance; Grid computing; Hard disks; High performance computing; Information technology; Large-scale systems; Parallel processing; diskless checkpointing; fault tolerance;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing and Communications, 2008. HPCC '08. 10th IEEE International Conference on
Conference_Location :
Dalian
Print_ISBN :
978-0-7695-3352-0
Type :
conf
DOI :
10.1109/HPCC.2008.123
Filename :
4637744
Link To Document :
بازگشت