DocumentCode :
2905011
Title :
A Novel Process Migration Method for MPI Applications
Author :
Liu, Tiantian ; Ma, Zhong ; Ou, Zhonghong
Author_Institution :
Dept. of Dependability, Comput., Wuhan Digital Eng. Inst., Wuhan, China
fYear :
2009
fDate :
16-18 Nov. 2009
Firstpage :
247
Lastpage :
251
Abstract :
Though a lot of research has been done on fault tolerance for MPI applications, process migration has not gained widespread use because the complexity of the requirement that the knowledge about the location of a migrated process has to be made known to every other process in the MPI application. In this paper, we present a novel and effective process migration method for MPI application. We implement a prototype called LAM/Migration which based on LAM/MPI + BLCR to provide transparent process migration for MPI application and the migration mechanism is built into LAM/MPI. All processes in MPI application including mpirun and MPI processes can be migrated to any different set of spare nodes in cluster under user specified in case of nodes failure in our method. Performance evaluation results showed that the checkpoint overhead is similar to plain LAM/MPI + BLCR, and the migration method is feasible and promising for overcoming nodes failure in large-scale parallel computing. By using LAM/Migration, the high availability and reliability of parallel computation can be achieved.
Keywords :
application program interfaces; message passing; parallel programming; software fault tolerance; software performance evaluation; LAM/Migration; MPI applications; availability; large-scale parallel computing; message passing interface; mpirun; performance evaluation; reliability; transparent process migration; Availability; Concurrent computing; Fault tolerance; Fault tolerant systems; Knowledge engineering; Large-scale systems; Message passing; Parallel processing; Prototypes; Research and development; LAM/MPI; checkpoint; distributed system; fault tolerance; high availability; process migration;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Dependable Computing, 2009. PRDC '09. 15th IEEE Pacific Rim International Symposium on
Conference_Location :
Shanghai
Print_ISBN :
978-0-7695-3849-5
Type :
conf
DOI :
10.1109/PRDC.2009.46
Filename :
5368724
Link To Document :
بازگشت