DocumentCode :
1147999
Title :
ickp: a consistent checkpointer for multicomputers
Author :
Plank, James S. ; Li, Kai
Author_Institution :
Tennessee Univ., Knoxville, TN, USA
Volume :
2
Issue :
2
fYear :
1994
Firstpage :
62
Lastpage :
67
Abstract :
There has been much research on checkpointing algorithms for parallel and distributed systems; but surprisingly few implementations for uniprocessors, multiprocessors, and distributed systems, and none at all for multicomputers. We discuss ickp, our consistent checkpointer for the Intel iPSC/860, which is the first general-purpose checkpointer for a multicomputer. It is a checkpointing library that may be invoked asynchronously from the host processor, at a periodic interval, or by a library call. It implements three consistent checkpointing algorithms, two optimizations to reduce checkpoint time and overhead, and recovery.<>
Keywords :
fault tolerant computing; message passing; parallel processing; program diagnostics; software reliability; system recovery; Intel iPSC/860; checkpoint time; checkpointing algorithms; checkpointing library; consistent checkpointer; distributed systems; general-purpose checkpointer; host processor; ickp; library call; multicomputers; optimizations; overhead; parallel systems; periodic interval; recovery; Automatic control; Checkpointing; Concurrent computing; Distributed computing; Fault tolerance; Fault tolerant systems; File systems; Libraries; Parallel processing; Registers;
fLanguage :
English
Journal_Title :
Parallel & Distributed Technology: Systems & Applications, IEEE
Publisher :
ieee
ISSN :
1063-6552
Type :
jour
DOI :
10.1109/88.311574
Filename :
311574
Link To Document :
بازگشت