DocumentCode :
3013859
Title :
The average availability of parallel checkpointing systems and its importance in selecting runtime parameters
Author :
Plank, J.S. ; Thomason, M.G.
Author_Institution :
Dept. of Comput. Sci., Tennessee Univ., Knoxville, TN, USA
fYear :
1999
fDate :
15-18 June 1999
Firstpage :
250
Lastpage :
257
Abstract :
Performance prediction of checkpointing systems in the presence of failures is a well-studied research area. While the literature abounds with performance models of checkpointing systems, none address the issue of selecting runtime parameters other than the optimal checkpointing interval. In particular the issue of processor allocation is typically ignored. In this paper we briefly present it performance model for long-running parallel computations that execute with checkpointing enabled. We then discuss how it is relevant to today´s parallel computing environments and software, and present case studies of using the model to select runtime parameters.
Keywords :
parallel programming; software fault tolerance; software performance evaluation; checkpointing systems; parallel checkpointing systems; parallel computing; performance models; processor allocation; runtime parameters; Checkpointing; Computer science; Distributed computing; Electrical capacitance tomography; Electronic switching systems; Parallel processing; Runtime;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fault-Tolerant Computing, 1999. Digest of Papers. Twenty-Ninth Annual International Symposium on
Conference_Location :
Madison, WI, USA
ISSN :
0731-3071
Print_ISBN :
0-7695-0213-X
Type :
conf
DOI :
10.1109/FTCS.1999.781059
Filename :
781059
Link To Document :
بازگشت