Title :
Phase Characterization and Classification for Micro-architecture Soft Error
Author :
Cheng, Yu ; Ma, Anguo ; Tang, Yuxing ; Zhang, Minxuan
Author_Institution :
Sch. of Comput. Sci., Nat. Univ. of Defense Technol., Changsha, China
Abstract :
Transient faults have become a key challenge to modern processor design. Processor designers take Architectural Vulnerability Factor (AVF) as an estimation method of micro-architectures soft error rate. Dynamic, phase-based system reliability management, which tunes system hardware and software parameters at runtime for different phases, has become a focus in the field of processor design. Phase characterization technique (PCT) and phase classification algorithm (PCA) determine the accuracy of phase identification, which is the foundation of dynamic, phase-based system management. To our knowledge, this paper is the first to give a comprehensive evaluation and comparison of PCTs and PCAs for micro-architecture soft error. We first compare the efficiency of basic block vectors (BBV) and performance metric counters (PMC) based PCTs in reliability-oriented phase characterization on three micro-architectural structures (i.e. instruction queue, function unit and reorder buffer). Experimental results show that PMC based PCT performs better than BBV based PCT for most programs studied. Also, we compare the accuracy of three clustering algorithms (i.e. hierarchical clustering, k-means clustering and regression tree) in reliability-oriented phase classification. Regression tree method is demonstrated to improve the accuracy of classification by 30% compared with other two PCAs on average. Furthermore, based on the comparisons of PCTs and PCAs, we propose the optimal combination of PCT and PCA for soft error reliability-oriented phase identification - the combination of PMC and regression tree. In addition, we quantify the upper bound of predictability of AVF using BBV/PMC. Overall, an average of 82% AVF can be explained by PMC, while BBV can explain 78% AVF averagely.
Keywords :
fault tolerant computing; logic design; microprocessor chips; pattern clustering; software architecture; architectural vulnerability factor; basic block vectors; hierarchical clustering; k-means clustering; microarchitecture soft error rate; performance metric counters; phase characterization technique; phase classification algorithm; phase identification; phase-based system reliability management; processor design; regression tree method; transient faults; phase characterization; phase classification; phase identification; predictability of reliability; soft error;
Conference_Titel :
Embedded and Ubiquitous Computing (EUC), 2010 IEEE/IFIP 8th International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4244-9719-5
Electronic_ISBN :
978-0-7695-4322-2
DOI :
10.1109/EUC.2010.109