Title :
CCK: An Improved Coordinated Checkpoint/Rollback Protocol for Dataflow Applications in Kaapi
Author :
Besseron, Xavier ; Jafar, Samir ; Gautier, Thierry ; Roch, Jean-Louis
Author_Institution :
Lab. ID-IMAG, Projet MOAIS(CNRS/INPG/INRIA/UJF), Monbonnot
Abstract :
Fault tolerance protocols play an important role in today long runtime scientific parallel applications because the probability of failure may be important due to the number of unreliable components involved during simulation. In this paper we present our approach and preliminary results about a new checkpoint/recovery protocol based on a coordinated scheme. This protocol is highly coupled to the availability of an abstract representation of the execution
Keywords :
checkpointing; data flow computing; data flow graphs; software fault tolerance; KAAPI application; coordinated checkpoint/rollback protocol; dataflow application; dataflow graph; execution abstract representation; fault tolerance protocols; runtime scientific parallel application; Computational modeling; Concurrent computing; Context modeling; Fault tolerance; Fault tolerant systems; Large-scale systems; Middleware; Protocols; Runtime; Virtual reality; Checkpoint/Recovery; Dataflow Graph; Parallel Application;
Conference_Titel :
Information and Communication Technologies, 2006. ICTTA '06. 2nd
Conference_Location :
Damascus
Print_ISBN :
0-7803-9521-2
DOI :
10.1109/ICTTA.2006.1684955