DocumentCode :
2809041
Title :
Checkpointing Message-Passing Interface (MPI) parallel programs
Author :
Li, Wei-Jih ; Tsay, Jyh-Jong
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Chung Cheng Univ., Chiayi, Taiwan
fYear :
1997
fDate :
15-16 Dec 1997
Firstpage :
147
Lastpage :
152
Abstract :
Many scientific problems can be distributed on a large number of processes to take advantage of low cost workstations. In a parallel systems, a failure on any processor can halt the computation and requires restarting all applications. Checkpointing is a simple technique to recover the failed execution. Message Passing Interface (MPI) is a standard proposed for writing portable message-passing parallel programs. In this paper, we present a checkpointing implementation for MPI programs, which is transparent, and requires no changes to the application programs. Our implementation combines coordinated, uncoordinated and message logging techniques
Keywords :
message passing; parallel programming; program testing; software portability; Message Passing Interface; checkpointing; message-passing; parallel programs; parallel systems; Availability; Checkpointing; Communications technology; Computer science; Concurrent computing; Costs; Degradation; Message passing; Workstations; Writing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fault-Tolerant Systems, 1997. Proceedings., Pacific Rim International Symposium on
Conference_Location :
Taipei
Print_ISBN :
0-8186-8212-4
Type :
conf
DOI :
10.1109/PRFTS.1997.640140
Filename :
640140
Link To Document :
بازگشت