مرکز منطقه ای اطلاع رساني علوم و فناوري - RDMA-Based Job Migration Framework for MPI over InfiniBand

DocumentCode :

2549180

Title :

RDMA-Based Job Migration Framework for MPI over InfiniBand

Author :

Ouyang, Xiangyong ; Marcarelli, Sonya ; Rajachandrasekar, Raghunath ; Panda, Dhabaleswar K.

Author_Institution :

Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA

fYear :

2010

fDate :

20-24 Sept. 2010

Firstpage :

116

Lastpage :

125

Abstract :

Coordinated checkpoint and recovery is a common approach to achieve fault tolerance on large-scale systems. The traditional mechanism dumps the process image to a local disk or a central storage area of all the processes involved in the parallel job. When a failure occurs, the processes are restarted and restored to the latest checkpoint image. However, this kind of approach is unable to provide the scalability required by increasingly large-sized jobs, since it puts heavy I/O burden on the storage subsystem, and resubmitting a job during restart phase incurs lengthy queuing delay. In this paper, we enhance the fault tolerance of MVAPICH2, an open-source high performance MPI-2 implementation, by using a proactive job migration scheme. Instead of checkpointing all the processes of the job and saving their process images to a stable storage, we transfer the processes running on a health-deteriorating node to a healthy spare node, and resume these processes from the spare node. RDMA-based process image transmission is designed to take advantage of high performance communication in InfiniBand. Experimental results show that the Job Migration scheme can achieve a speedup of 4.49 times over the Checkpoint/Restart scheme to handle a node failure for a 64-process application running on 8 compute nodes. To the best of our knowledge, this is the first such job migration design for InfiniBand-based clusters.

Keywords :

application program interfaces; checkpointing; fault tolerant computing; file organisation; large-scale systems; pattern clustering; InfiniBand; MPI-2; RDMA; checkpointing; fault tolerance; image transmission; large-scale systems; proactive job migration scheme; restart scheme; Backplanes; Context; Fault tolerance; Fault tolerant systems; IP networks; Kernel; Libraries; Checkpoint; MVAPICH2; Proactive Fault Tolerance; Process-Migration;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Cluster Computing (CLUSTER), 2010 IEEE International Conference on

Conference_Location :

Heraklion, Crete

Print_ISBN :

978-1-4244-8373-0

Electronic_ISBN :

978-0-7695-4220-1

Type :

conf

DOI :

10.1109/CLUSTER.2010.20

Filename :

5600314

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2549180