مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

2044051

Title :

Cooperative checkpointing theory

Author :

Oliner, Adam ; Rudolph, Larry ; Sahoo, Ramendra

Author_Institution :

Dept. of Comput. Sci., Stanford Univ., Palo Alto, CA

fYear :

2006

fDate :

25-29 April 2006

Abstract :

Cooperative checkpointing uses global knowledge of the state and health of the machine to improve performance and reliability by dynamically deciding when to skip checkpoint requests made by applications. Using results from cooperative checkpointing theory, this paper proves that periodic checkpointing is not expected to be competitive with the offline optimal. By leveraging probabilistic information about the future, cooperative checkpointing gives flexible algorithms that are optimally competitive. The results prove that simulating periodic checkpointing; by performing only every dth checkpoint, is not competitive with the offline optimal in the worst case; a simple modification gives a provably competitive algorithm. Calculations using failure traces from a prototype of IBM´s Blue Gene/L show an application using cooperative checkpointing may make progress 4 times faster than one using periodic checkpointing, under realistic conditions. We contribute an approach to providing large-scale system reliability through cooperative checkpointing and techniques for analyzing the approach

Keywords :

checkpointing; large-scale systems; IBM Blue Gene/L; cooperative checkpointing theory; large-scale system reliability; periodic checkpointing; Checkpointing; Computer science; Cost function; Interference; Large-scale systems; Performance evaluation; Programming profession; Prototypes; Runtime;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International

Conference_Location :

Rhodes Island

Print_ISBN :

1-4244-0054-6

Type :

conf

DOI :

10.1109/IPDPS.2006.1639368

Filename :

1639368

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2044051