Author :
Fu, Xin ; Poe, James ; Li, Tao ; Fortes, José A B
Abstract :
Computer systems increasingly depend on exploiting program dynamic behavior to optimize performance, power and reliability. Prior studies have shown that program execution exhibits phase behavior in both performance and power domains. Reliabilityoriented program phase behavior, however, remains largely unexplored. As semiconductor transient faults (soft errors) emerge as a critical challenge to reliable system design, characterizing program phase behavior from a reliability perspective is crucial in order to apply dynamic fault-tolerant mechanisms and to optimize performance/reliability trade-offs. In this paper, we compute run-time program vulnerability to soft errors on four microarchitecture structures (i.e. instruction window, reorder buffer, function units and wakeup table) in a high-performance out-of-order execution superscalar processor. Experimental results on the SPEC2000 benchmarks show a considerable amount of time varying behavior in reliability measurements. Our study shows that a single performance metric, such as IPC, cache miss or branch misprediction, is not a good indicator for program vulnerability. The vulnerabilities of the studied microarchitecture structures are then correlated with program code-structure and run-time events to identify vulnerability phase behavior. We observed that both program code-structure and run-time events appear promising in classifying program reliability phase behavior. Overall, performance counter based schemes achieved an average Coefficient of Variation (COV) of 3.5%, 4.5%, 4.3% and 5.7% on the instruction queue, reorder buffer, function units and the wakeup table, while basic block vectors offer COVs of 4.9%, 5.8%, 5.4% and 6% on the four studied microarchitecture structures respectively. We found that in general, tracking performance metrics performs better than tracking control flow in identifying reliability phase behavior of applications. To our knowledge, this paper is the first to characterize program reliability phase behavior at the microarchitecture level.
Conference_Titel :
Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, 2006. MASCOTS 2006. 14th IEEE International Symposium on