Title :
The average availability of parallel checkpointing systems and its importance in selecting runtime parameters
Author :
Plank, J.S. ; Thomason, M.G.
Author_Institution :
Dept. of Comput. Sci., Tennessee Univ., Knoxville, TN, USA
Abstract :
Performance prediction of checkpointing systems in the presence of failures is a well-studied research area. While the literature abounds with performance models of checkpointing systems, none address the issue of selecting runtime parameters other than the optimal checkpointing interval. In particular the issue of processor allocation is typically ignored. In this paper we briefly present it performance model for long-running parallel computations that execute with checkpointing enabled. We then discuss how it is relevant to today´s parallel computing environments and software, and present case studies of using the model to select runtime parameters.
Keywords :
parallel programming; software fault tolerance; software performance evaluation; checkpointing systems; parallel checkpointing systems; parallel computing; performance models; processor allocation; runtime parameters; Checkpointing; Computer science; Distributed computing; Electrical capacitance tomography; Electronic switching systems; Parallel processing; Runtime;
Conference_Titel :
Fault-Tolerant Computing, 1999. Digest of Papers. Twenty-Ninth Annual International Symposium on
Conference_Location :
Madison, WI, USA
Print_ISBN :
0-7695-0213-X
DOI :
10.1109/FTCS.1999.781059