DocumentCode :
3598692
Title :
CLIP: A Checkpointing Tool for Message Passing Parallel Programs
Author :
Chen, Yuqun ; Li, Kai ; Plank, James S.
Author_Institution :
Princeton University
fYear :
1997
Firstpage :
33
Lastpage :
33
Abstract :
Checkpointing is a useful technique for rollback recovery. We present CLIP, a user-level library that provides semi-transparent checkpointing for parallel programs on the Intel Paragon multicomputer. Creating an actual tool for checkpointing a complex machine like the Paragon is not easy, because many issues arise that require careful design decisions to be made. We detail what these decisions are, and how they were made in CLIP. We present performance data when checkpointing several long-running parallel applications. These results show that a convenient, general-purpose checkpointing tool like CLIP can provide fault-tolerance on a massively parallel multicomputer with good performance.
Keywords :
Checkpointing; Message passing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Supercomputing, ACM/IEEE 1997 Conference
Print_ISBN :
0-89791-985-8
Type :
conf
DOI :
10.1109/SC.1997.10034
Filename :
1592614
Link To Document :
بازگشت