Title :
Metrics for Architecture-Level Lifetime Reliability Analysis
Author :
Ramachandrany, Pradeep ; Adve, Sarita V. ; Bose, Pradip ; Rivers, Jude A.
Author_Institution :
Dept. of Comput. Sci., Univ. of Illinois at Urbana-Champaign, Urbana, IL
Abstract :
This work concerns metrics for evaluating microarchitectural enhancements to improve processor lifetime reliability. A commonly reported reliability metric is mean time to failure (MTTF). Although the MTTF metric is simpler to evaluate, it does not provide information on the reliability characteristics during the relatively short operational life of commodity processors. An alternate metric is nTTF, which represents the time to failure of n% of the processor population. nTTF is a more informative metric for the (short) portion of the lifetime that is relevant to the end- user, but determining it requires knowledge of the distribution of processor failure times which is generally hard to obtain. The goals of this paper are (1) to determine if the choice of metric has a quantitative impact on architecture-level reliability analysis and modern superscalar processor designs and (2) to build a fundamental understanding of why and when MTTF- and nTTF- driven analysis result in different designs. We show through an in- depth analysis that, in general, the nTTF metric differs significantly from the MTTF metric, and using MTTF as a proxy for nTTF leads to sub-optimal designs. Additionally, our analysis introduces the concept of relative vulnerability factor (RVF) for different processor components to guide reliability-aware design. We show that the difference between nTTF- and MTTF-driven design largely occurs because the relative vulnerabilities of the processor components change over the processor lifetime, making the optimal design choice dependent on the amount of time the processor is expected to be used.
Keywords :
integrated circuit reliability; microprocessor chips; MTTF metric; architecture-level lifetime reliability analysis; mean time to failure; microarchitectural enhancements; nTTF metric; processor lifetime reliability; relative vulnerability factor; reliability metric; reliability-aware design; Accelerated aging; Computer science; Electric breakdown; Electromigration; Failure analysis; Microarchitecture; Negative bias temperature instability; Niobium compounds; Process design; Rivers;
Conference_Titel :
Performance Analysis of Systems and software, 2008. ISPASS 2008. IEEE International Symposium on
Conference_Location :
Austin, TX
Print_ISBN :
978-1-4244-2232-6
Electronic_ISBN :
978-1-4244-2233-3
DOI :
10.1109/ISPASS.2008.4510752