DocumentCode :
3175525
Title :
Probabilistic checkpointing
Author :
Hyo-Chang Nam ; Jong Kim ; SungJe Hong ; Sunggu Lee
Author_Institution :
Dept. of Comput. Sci. & Eng., Pohang Univ. of Sci. & Technol., South Korea
fYear :
1997
fDate :
24-27 June 1997
Firstpage :
48
Lastpage :
57
Abstract :
Many optimization schemes have been proposed to reduce the overhead of checkpointing. Incremental checkpointing based on memory page protection has been one of the successful schemes used to reduce the overhead and to improve the performance of checkpointing. In this paper, we propose two checkpointing schemes, called "block encoding" and "combined block encoding", which further reduce the checkpointing overhead. The smallest unit of checkpoint data in our scheme is a block, which is smaller than a page-this reduces the amount of checkpoint data required when compared with page-based incremental checkpointing. One drawback of the proposed schemes is the possibility of aliasing in encoded words. In this paper, however, we show that the aliasing probability is near zero when an 8-byte encoded word is used. The performance of the proposed schemes is analyzed and measured using experiments. First, we construct an analytic model that predicts the checkpointing overhead. By using this model, we can estimate the block size that produces the best performance for a given target program. Next, the proposed schemes are implemented on libckpt, a general-purpose checkpointing library for Unit based system which was developed at the University of Tennessee. According to our experimental results, the proposed schemes reduce the overhead by 11.7% in the best case and increase the overhead by 0.5% in the worst case in comparison with page-based incremental checkpointing. In most cases, the combined block encoding scheme shows an improvement over both block encoding and page-based incremental checkpointing.
Keywords :
encoding; fault tolerant computing; optimisation; performance evaluation; system recovery; 8-byte encoded word; aliasing probability; analytic model; block encoding; block size; combined block encoding; general-purpose checkpointing library; incremental checkpointing; memory page protection; optimization schemes; performance; probabilistic checkpointing; Checkpointing; Computer science; Costs; Delay; Encoding; Fault tolerant systems; Libraries; Multiprocessing systems; Performance analysis; Protection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fault-Tolerant Computing, 1997. FTCS-27. Digest of Papers., Twenty-Seventh Annual International Symposium on
Conference_Location :
Seattle, WA, USA
ISSN :
0731-3071
Print_ISBN :
0-8186-7831-3
Type :
conf
DOI :
10.1109/FTCS.1997.614077
Filename :
614077
Link To Document :
بازگشت