DocumentCode :
2482226
Title :
Evaluation of replication and fault detection in P2P-MPI
Author :
Genaud, Stéphane ; Rattanapoka, Choopan
Author_Institution :
AlGorille Team, LORIA, Nancy, France
fYear :
2009
fDate :
23-29 May 2009
Firstpage :
1
Lastpage :
8
Abstract :
We present in this paper an evaluation of fault management in the grid middleware P2P-MPI. One of P2P-MPI´s objective is to support environments using commodity hardware. Hence, running programs is failure prone and a particular attention must be paid to fault management. The fault management covers two issues: fault-tolerance and fault detection. P2P-MPI provides a transparent fault tolerance facility based on replication of computations. Fault detection concerns the monitoring of the program execution by the system. The monitoring is done through a distributed set of modules called failure detectors. In this paper, we report results from several experiments which show the overhead of replication, and the cost of fault detection.
Keywords :
application program interfaces; fault tolerant computing; peer-to-peer computing; P2P-MPI; commodity hardware; failure detectors; fault detection; fault tolerance; program execution monitoring; replication; Condition monitoring; Costs; Detectors; Educational institutions; Fault detection; Fault tolerance; Grid computing; Middleware; Peer to peer computing; Technology management; Fault-tolerance; Grid computing; Parallelism;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on
Conference_Location :
Rome
ISSN :
1530-2075
Print_ISBN :
978-1-4244-3751-1
Electronic_ISBN :
1530-2075
Type :
conf
DOI :
10.1109/IPDPS.2009.5160969
Filename :
5160969
Link To Document :
بازگشت