DocumentCode
327893
Title
An experimental study about diskless checkpointing
Author
Silva, Luis M. ; Silva, Joslo Gabriel
Author_Institution
Dept. de Engenharia Inf., Coimbra Univ., Portugal
Volume
1
fYear
1998
fDate
25-27 Aug 1998
Firstpage
395
Abstract
Checkpointing and rollback recovery is a very effective technique to tolerate the occurrence of failures. Usually, the checkpoint data is saved in some diskfiles. However, in some situations the disk operation may result in a considerable performance overhead. Alternative solutions would make use of main memory to maintain the checkpoint data. The paper presents two main memory check pointing schemes that can be used in any parallel machine without requiring any change to the hardware: one scheme saves the checkpoints in the memory of other processors, while the other is based on a parity approach. Both techniques have been implemented and evaluated in a commercial parallel machine. Some conclusions have been taken that clearly show the superiority of one of those schemes
Keywords
fault tolerant computing; parallel machines; parallel programming; storage management; system recovery; checkpoint data; commercial parallel machine; disk operation; diskless checkpointing; experimental study; memory check pointing schemes; parity approach; performance overhead; rollback recovery; Checkpointing; Computer crashes; Fault tolerance; Hardware; Maintenance; Parallel machines; Random access memory; Read-write memory; Workstations; Writing;
fLanguage
English
Publisher
ieee
Conference_Titel
Euromicro Conference, 1998. Proceedings. 24th
Conference_Location
Vasteras
ISSN
1089-6503
Print_ISBN
0-8186-8646-4
Type
conf
DOI
10.1109/EURMIC.1998.711832
Filename
711832
Link To Document