Title :
Portable checkpointing and recovery
Author :
Silva, Luis M. ; Silva, João G. ; Chapple, Simon ; Clarke, Lyndon
Author_Institution :
Dept. de Engenharia Inf., Coimbra Univ., Portugal
Abstract :
This paper presents a checkpointing scheme that was implemented in a parallel library that runs on top of CHIMP/MPI. The main goals of the checkpointing mechanism are portability and efficiency. It runs on every platform supported by MPI in a machine-independent way. The scheme allows the migration of checkpoints and offers a flexible recovery mechanism based on data-reconfiguration. Some performance results will be presented at the end of the paper together with some techniques that can be used to increase the efficiency of the checkpointing mechanism
Keywords :
operating systems (computers); parallel machines; software portability; system recovery; data-reconfiguration; f CHIMP/MPI; flexible recovery mechanism; parallel library; portability; portable checkpointing; recovery; Checkpointing; Computer crashes; Distributed computing; Guidelines; Libraries; Operating systems; Parallel machines; Parallel processing; Proposals; Workstations;
Conference_Titel :
High Performance Distributed Computing, 1995., Proceedings of the Fourth IEEE International Symposium on
Conference_Location :
Washington, DC
Print_ISBN :
0-8186-7088-6
DOI :
10.1109/HPDC.1995.518709