DocumentCode :
3475027
Title :
The performance of consistent checkpointing in distributed shared memory systems
Author :
Cabillic, Gilbert ; Muller, Gilles ; Puaut, Isabelle
Author_Institution :
IRISA, Rennes, France
fYear :
1995
fDate :
13-15 Sep 1995
Firstpage :
96
Lastpage :
105
Abstract :
This paper presents the design and implementation of a consistent checkpointing scheme for distributed shared memory (DSM) systems. Our approach relies on the integration of checkpoints within synchronization barriers already existing in applications; this avoids the need to introduce an additional synchronization mechanism. The main advantage of our checkpointing mechanism is that performance degradation arises only when a checkpoint is being taken; hence, the programmer can adjust the trade-off between the cost of checkpointing and the cost of longer rollbacks by adjusting the time between two successive checkpoints. The paper compares several implementations of the proposed consistent checkpointing mechanism (incremental, non-blocking, and pre-flushing) on the Intel Paragon multicomputer for several parallel scientific applications. Performance measures show that a careful optimization of the checkpointing protocol can reduce the time overhead of checkpointing from 8% to 0.04% of the application duration for a 6 mn checkpointing interval
Keywords :
distributed memory systems; message passing; program debugging; shared memory systems; software performance evaluation; synchronisation; Intel Paragon multicomputer; consistent checkpointing; distributed shared memory systems; parallel scientific applications; performance; performance degradation; rollbacks; synchronization barriers; Checkpointing; Computer crashes; Costs; Degradation; Frequency synchronization; Hardware; Message passing; Protocols; Random access memory; Time measurement;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Reliable Distributed Systems, 1995. Proceedings., 14th Symposium on
Conference_Location :
Bad Neuenahr
ISSN :
1060-9857
Print_ISBN :
0-8186-7153-X
Type :
conf
DOI :
10.1109/RELDIS.1995.526217
Filename :
526217
Link To Document :
بازگشت