DocumentCode :
2979044
Title :
Fault-tolerance using cache-coherent distributed shared memory systems
Author :
Hecht, D.L. ; Kavi, K.M. ; Gaede, R.K. ; Katsinis, C.
Author_Institution :
Alabama Univ., Huntsville, AL, USA
fYear :
1999
fDate :
1999
Firstpage :
100
Lastpage :
105
Abstract :
Describes new protocols augmenting traditional cache coherency mechanisms to implement fault tolerance based on recovery blocks and checkpointing. Concurrent processes compound rollback recovery since the rollback can potentially lead to a “domino effect” whereby the process is rolled back to the beginning. Several approaches have been proposed to limit the domino effect. One set of such techniques requires communicating processes to periodically synchronize in order to checkpoint a globally consistent state. These schemes can be implemented more naturally on distributed shared memory systems using synchronization on shared memory. We have developed extensions to well-known cache-coherency methods (e.g. directory-based) for the implementation of checkpointing consistent states
Keywords :
cache storage; coherence; distributed shared memory systems; fault tolerant computing; memory protocols; synchronisation; system recovery; cache-coherent distributed shared memory systems; checkpointing; communicating process synchronization; concurrent processes; directory-based cache-coherency methods; domino effect; fault tolerance; globally consistent state; protocols; recovery blocks; rollback recovery; Decision support systems; Fault tolerant systems;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Architectures, Algorithms, and Networks, 1999. (I-SPAN '99) Proceedings. Fourth InternationalSymposium on
Conference_Location :
Perth/Fremantle, WA
ISSN :
1087-4089
Print_ISBN :
0-7695-0231-8
Type :
conf
DOI :
10.1109/ISPAN.1999.778924
Filename :
778924
Link To Document :
بازگشت