DocumentCode :
654959
Title :
[2009] A Stage-Level Recovery Scheme in Scalable Pipeline Modules for High Dependability
Author :
Jun Yao ; Shimada, Hiroki ; Kobayashi, Kaoru
Author_Institution :
Grad. Sch. of Inf. Sci., Nara Inst. of Sci. & Technol., Ikoma, Japan
fYear :
2010
fDate :
17-19 Jan. 2010
Firstpage :
21
Lastpage :
29
Abstract :
In the recent years, the increasing error rate has become one of the major impediments for the application of new process technologies in electronic devices like microprocessors. This thereby necessitates the research of fault toleration mechanisms from all device, micro-architecture and system levels to keep correct computation in future microprocessors, along the advances of process technologies.Space redundancy, as dual or triple modular redundancy (DMR or TMR), is widely used to tolerate errors with a negligible performance loss. In this paper, at the micro-architecture level, we propose a very fine-grained recovery scheme based on a DMR processor architecture to cover every transient error inside of the memory interface boundary. Our recovery method makes full use of the existing duplicated hardware in the DMR processor, which can avoid large hardware extension by not using checkpoint buffers in many fault-tolerable processors. The hardware-based recovery is achieved by dynamically triggering an instruction re-execution procedure in the next cycle after error detection, which indicates a near-zero performance impact to achieve an error-free execution.A TMR architecture is usually preferred as it provides a seamless error correction by a majority voting logic and therefore generates no recovery delay. With our fast recovery scheme at a low hardware cost, our result shows that even under a relatively high transient error rate, it is possible to only use a DMR architecture to detect/recover errors at a negligible performance cost. Our reliable processor is thus constructed to use a DMR execution with the fast recovery as its major working mode. It saves around 1/3 energy consumption from a traditional TMR architecture, while the transient error coverage is still maintained.
Keywords :
checkpointing; fault tolerant computing; pipeline processing; power aware computing; DMR processor architecture; TMR architecture; checkpoint buffers; dual modular redundancy; duplicated hardware; electronic devices; energy consumption; error detection; error rate; fault-tolerable processors; fine-grained recovery scheme; hardware-based recovery; instruction re-execution procedure; majority voting logic; memory interface boundary; microarchitecture level; microprocessors; near-zero performance impact; scalable pipeline modules; space redundancy; stage-level recovery scheme; transient error coverage; triple modular redundancy; Computer architecture; Hardware; Pipelines; Program processors; Redundancy; Registers; Transient analysis; Fault tolerance; redundancy; system recovery;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Innovative Architecture for Future Generation High Performance (IWIA), 2010 International Workshop on
Conference_Location :
Kona, HI
ISSN :
1527-1366
Type :
conf
DOI :
10.1109/IWIA.2010.11
Filename :
6685623
Link To Document :
بازگشت