DocumentCode :
3782960
Title :
A fault tolerance infrastructure for dependable computing with high-performance COTS components
Author :
A. Avizienis
Author_Institution :
A. Avizienis & Assoc. Inc., Santa Monica, CA, USA
fYear :
2000
Firstpage :
492
Lastpage :
500
Abstract :
The failure rates of current COTS processors have dropped to 100 FITs (failures per 10/sup 9/ hours), indicating a potential MTTF of over 1100 years. However our recent study of Intel P6 family processors has shown that they have very limited error detection and recovery capabilities and contain numerous design faults ("errata"). Other limitations are susceptibility to transient faults and uncertainty about "wearout" that could increase the failure rate in time. Because of these limitations, an external fault tolerance infrastructure is needed to assure the dependability of a system with such COTS components. The paper describes a fault-tolerant "infrastructure" system of fault tolerance functions that makes possible the use of low-coverage COTS processors in a fault-tolerant, self-repairing system. The custom hardware supports transient recovery design fault tolerance, and self-repair by scaring and replacement. Fault tolerance functions are implemented by four types of hardware are processors of low complexity that are fault-tolerant. High error detection coverage, including design faults, is attained by diversity and replication.
Keywords :
"Fault tolerance","Fault detection","Hardware","Uncertainty","Error correction","Software design","Semiconductor devices","Logic devices","Logic design","Environmental factors"
Publisher :
ieee
Conference_Titel :
Dependable Systems and Networks, 2000. DSN 2000. Proceedings International Conference on
Print_ISBN :
0-7695-0707-7
Type :
conf
DOI :
10.1109/ICDSN.2000.857581
Filename :
857581
Link To Document :
بازگشت