• DocumentCode
    2905011
  • Title

    A Novel Process Migration Method for MPI Applications

  • Author

    Liu, Tiantian ; Ma, Zhong ; Ou, Zhonghong

  • Author_Institution
    Dept. of Dependability, Comput., Wuhan Digital Eng. Inst., Wuhan, China
  • fYear
    2009
  • fDate
    16-18 Nov. 2009
  • Firstpage
    247
  • Lastpage
    251
  • Abstract
    Though a lot of research has been done on fault tolerance for MPI applications, process migration has not gained widespread use because the complexity of the requirement that the knowledge about the location of a migrated process has to be made known to every other process in the MPI application. In this paper, we present a novel and effective process migration method for MPI application. We implement a prototype called LAM/Migration which based on LAM/MPI + BLCR to provide transparent process migration for MPI application and the migration mechanism is built into LAM/MPI. All processes in MPI application including mpirun and MPI processes can be migrated to any different set of spare nodes in cluster under user specified in case of nodes failure in our method. Performance evaluation results showed that the checkpoint overhead is similar to plain LAM/MPI + BLCR, and the migration method is feasible and promising for overcoming nodes failure in large-scale parallel computing. By using LAM/Migration, the high availability and reliability of parallel computation can be achieved.
  • Keywords
    application program interfaces; message passing; parallel programming; software fault tolerance; software performance evaluation; LAM/Migration; MPI applications; availability; large-scale parallel computing; message passing interface; mpirun; performance evaluation; reliability; transparent process migration; Availability; Concurrent computing; Fault tolerance; Fault tolerant systems; Knowledge engineering; Large-scale systems; Message passing; Parallel processing; Prototypes; Research and development; LAM/MPI; checkpoint; distributed system; fault tolerance; high availability; process migration;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Dependable Computing, 2009. PRDC '09. 15th IEEE Pacific Rim International Symposium on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-0-7695-3849-5
  • Type

    conf

  • DOI
    10.1109/PRDC.2009.46
  • Filename
    5368724