Title :
Checkpointing and error recovery in a uniprocessor system with on-chip cache
Author :
Ahmed, Rana Ejaz
Author_Institution :
Res. in Motion Ltd., Waterloo, Ont., Canada
Abstract :
Checkpointing and rollback error recovery technique used in fault-tolerant systems allows recovery from errors without a need for a global restart of computation. This paper presents two efficient and low-cost schemes to handle soft (transient) errors in a uniprocessor system with on-chip cache memory. These user-transparent schemes are implemented in hardware and require negligible hardware overhead in the designs of processor and cache memory. The first scheme uses a write-through policy for on-chip cache and a checkpoint is established on each write-through; while the second scheme offers improvement over the working of the first scheme by including a second level cache in the memory hierarchy. A simple mathematical model is developed and a trade-off analysis between two schemes is presented
Keywords :
cache storage; fault tolerant computing; system recovery; checkpointing; error recovery; fault-tolerant systems; on-chip cache; rollback error recovery; uniprocessor system; write-through policy; Cache memory; Checkpointing; Communication system software; Communications technology; Computer errors; Fault tolerant systems; Hardware; Mathematical model; Process design; System-on-a-chip;
Conference_Titel :
Electrical and Computer Engineering, 2001. Canadian Conference on
Conference_Location :
Toronto, Ont.
Print_ISBN :
0-7803-6715-4
DOI :
10.1109/CCECE.2001.933720