Title :
A cache protocol for error detection and recovery in fault-tolerant computing systems
Author :
Chung-Ho Chen ; Somani, A.K.
Author_Institution :
Dept. of Electr. Eng., Nat. Yunlin Inst. of Technol., Taiwan
Abstract :
We propose an error detection and recovery protocol for redundant processor systems employing caches. The protocol allows cache-based systems to vote more often and thereby reduce the chance of losing synchronization. The scheme is based on cache data broadcasting of a dirty line after modification. The scheme effectively exploits the redundancy of a fault-tolerant system using hardware voting. It recovers from erroneous data written by a processor and this remedies the insufficiency of error-correcting codes. The protocol can also be used to speed-up resynchronization process for a temporarily failed processor in a redundant system. More than 60% of cache lines are fully covered for recovery due to errors originated from the cache itself, including unrecoverable ECC errors. The performance overhead is to broadcast only 2-3% of the total memory references.<>
Keywords :
buffer storage; error correction; error detection; fault tolerant computing; protocols; redundancy; storage management; synchronisation; cache data broadcasting; cache protocol; dirty line; erroneous data; error detection; error recovery; fault-tolerant computing systems; hardware voting; performance overhead; redundant processor systems; synchronization; Broadcasting; Cache memory; Casting; Delay; Error correction codes; Fault detection; Fault tolerant systems; Hardware; Protocols; Voting;
Conference_Titel :
Fault-Tolerant Computing, 1994. FTCS-24. Digest of Papers., Twenty-Fourth International Symposium on
Conference_Location :
Austin, TX, USA
Print_ISBN :
0-8186-5520-8
DOI :
10.1109/FTCS.1994.315632