DocumentCode :
2955118
Title :
Transparent fault tolerance middleware at user level
Author :
Castro, Marcela ; Rexachs, Dolores ; Luque, Emilio
Author_Institution :
Comput. Archit. & Oper. Syst. Dept., Univ. Autonoma de Barcelona, Barcelona, Spain
fYear :
2012
fDate :
2-6 July 2012
Firstpage :
566
Lastpage :
572
Abstract :
We present a design of a transparent fault tolerance middleware for message passing applications. The approach consists in transforming the interconnections used by the application in reliable ones and support log-based rollback recovery protocol. When one of the nodes of the cluster fails, the processes are recovered in a new one and the connections are reestablished. All this work is made automatically and in a transparent way for the application. This service can be optionally activated at runtime at user level. The models used for protection and recovering application and detection of failures are based on RADIC architecture. We have tested this middleware by executing a master-worker (M/W) and SPMD applications which follow different communication patterns.
Keywords :
message passing; middleware; software fault tolerance; system recovery; RADIC architecture; SPMD applications; communication patterns; failure detection; log-based rollback recovery protocol; master-worker applications; message passing applications; transparent fault tolerance middleware; user level; Fault tolerance; Fault tolerant systems; Libraries; Observers; Peer to peer computing; Protocols; Sockets; Fault-tolerance; High-Availability; RADIC; parallel computing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing and Simulation (HPCS), 2012 International Conference on
Conference_Location :
Madrid
Print_ISBN :
978-1-4673-2359-8
Type :
conf
DOI :
10.1109/HPCSim.2012.6266974
Filename :
6266974
Link To Document :
بازگشت