DocumentCode :
1149762
Title :
Experimental evaluation of error-detection mechanisms
Author :
Constantinescu, Cristian
Author_Institution :
Intel Corp., Hillsboro, OR, USA
Volume :
52
Issue :
1
fYear :
2003
fDate :
3/1/2003 12:00:00 AM
Firstpage :
53
Lastpage :
57
Abstract :
Effective error-detection is paramount for building highly dependable computing systems. A new methodology, based on physical and simulated fault injection, has been developed for assessing the effectiveness of error-detection mechanisms. This approach has 2 steps: (1) transient faults are physically injected at the IC pin level of a prototype, in order to derive the error-detection coverage. Experiments are carried out in a 3-dimensional space of events. Fault location, time of occurrence, and duration of the injected fault are the dimensions of this space. (2) Simulated fault-injection is performed to assess the effectiveness of new error-detection mechanisms, designed to improve the detection coverage. Complex circuitry, based on checking for protocol violations, is considered. A temporal model of the protocol checker is used, and transient faults are injected in signal traces captured from the prototype system. These traces are used as inputs of the simulation engine. s-confidence intervals of the error-detection coverage are derived, both for the initial design and the new detection mechanism. Physical fault-injection, carried out on a prototype server, proved that several signals were sensitive to transient faults and error-detection coverage was unacceptably low. Simulated fault injection shows that an error-detection mechanism, based on checking for protocol violations, can appreciably increase the detection coverage, especially for transient faults longer that 200 nanoseconds. Additional research is required for improving the error-detection of shorter transients. Fault injection experiments also show that error-detection coverage is a function of fault duration: the shorter the transient fault, the lower the coverage. As a consequence, injecting faults that have a unique, predefined duration, as it was frequently done in the past, does not provide accurate information on the effectiveness of the error-detection mechanisms. Injecting only permanent faults leads to unrealistically high estimates of the coverage. These experiments prove that combined physical and simulated fault injection, performed in a 3-dimensional space of events, is a superior approach, which allows the designers to accurately assess the efficacy of various candidate error-detecti- on mechanisms without building expensive test circuits.
Keywords :
error detection; fault location; fault tolerant computing; network servers; probability; 3-dimensional space of events; IC pin; complex circuitry; computing systems; error-detection; error-detection coverage; error-detection mechanisms; fault location; injected fault duration; permanent faults; protocol violations; s-confidence intervals; server; shorter transients; signal traces; simulated fault injection; simulation engine; time of occurrence; transient faults; Circuit faults; Circuit simulation; Circuit testing; Computational modeling; Electrical fault detection; Engines; Fault detection; Fault location; Protocols; Prototypes;
fLanguage :
English
Journal_Title :
Reliability, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9529
Type :
jour
DOI :
10.1109/TR.2002.805785
Filename :
1179798
Link To Document :
بازگشت