DocumentCode :
3285930
Title :
The Performance of Erasure Codes Used in FT-MPI
Author :
Xiaoguang, Liu ; Gang, Wang ; Yu, Zhang ; Ang, Li ; Fang, Xie
Author_Institution :
Nankai-Baidu Joint Lab., Nankai Univ., Tianjin, China
Volume :
3
fYear :
2009
fDate :
15-17 May 2009
Firstpage :
360
Lastpage :
363
Abstract :
Today, the scale of high performance computing (HPC) systems is much larger than ever. Some HPC systems consist of thousands or even tens of thousands of processors. The larger scale leads to a challenge that how to deal with process failures. The most important programming tool for HPC is MPI (message passing interface). There are some existing methods to deal with fault-tolerance, such as MPICH-V, StarFish, MPI/FT and so on, using the MPI context. Most of them do the checkpoint on disk. In this paper, some erasure codes, which used in RAID systems usually, are applied to deal with the fault-tolerance in-memory. Based on fault-tolerance-MPI (FT-MPI) platform, RAID4, RAID5, RDP and X-code are implanted to do the checkpoint in-memory. The experimental results show that RDP is feasible for double-fault-tolerance in-memory.
Keywords :
message passing; software fault tolerance; software reliability; RAID systems; checkpoint in-memory; erasure codes; fault-tolerance-MPI platform; high performance computing system; message passing interface; programming tool; Application software; Computer science; Educational institutions; Fault tolerance; Hardware; High performance computing; Information technology; Libraries; Message passing; Processor scheduling; MPI; RDP; X-Code; fault-tolerance;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Technology and Applications, 2009. IFITA '09. International Forum on
Conference_Location :
Chengdu
Print_ISBN :
978-0-7695-3600-2
Type :
conf
DOI :
10.1109/IFITA.2009.185
Filename :
5232135
Link To Document :
بازگشت