Title :
Cross-core event monitoring for processor failure prediction
Author :
Salfner, Felix ; Tröger, Peter ; Tschirpke, Steffen
Author_Institution :
Humboldt Univ. Berlin, Berlin, Germany
Abstract :
A recent trend in the design of commodity processors is the combination of multiple independent execution units on one chip. With the resulting increase of complexity and transistor count, it becomes more and more likely that a single execution unit on a processor gets faulty. In order to tackle this situation, we propose an architecture for dependable process management in chip-multiprocessing machines. In our approach, execution units survey each other to anticipate future hardware failures. The prediction relies on the analysis of processor hardware performance counters by a statistical rank-sum test. Initial experiments with the Intel Core processor platform proved the feasibility of the approach, but also showed the need for further investigation due to a high prediction quality variation in most of the cases.
Keywords :
fault tolerant computing; microprocessor chips; multiprocessing systems; performance evaluation; Intel Core processor platform; chip-multiprocessing machine; commodity processor; cross-core event monitoring; independent execution unit; process management; processor failure prediction; processor hardware performance counter; statistical rank-sum test; transistor count; Condition monitoring; Counting circuits; Engines; Failure analysis; Hardware; Multicore processing; Parallel processing; Performance analysis; Process design; Testing; failure prediction; fault injection; multi-core; performance counter;
Conference_Titel :
High Performance Computing & Simulation, 2009. HPCS '09. International Conference on
Conference_Location :
Leipzig
Print_ISBN :
978-1-4244-4906-4
Electronic_ISBN :
978-1-4244-4907-1
DOI :
10.1109/HPCSIM.2009.5191988