Title :
Towards understanding the effects of intermittent hardware faults on programs
Author :
Rashid, Layali ; Pattabiraman, Karthik ; Gopalakrishnan, Sathish
Author_Institution :
Univ. of British Columbia, Vancouver, BC, Canada
fDate :
June 28 2010-July 1 2010
Abstract :
Intermittent hardware faults are bursts of errors that last from a few CPU cycles to a few seconds. They are caused by process variations, circuit wear-out, and temperature, clock or voltage fluctuations. Recent studies show that intermittent fault rates are increasing due to technology scaling and are likely to be a significant concern in future systems. We study the propagation of intermittent faults to programs; in particular, we are interested in the crash behaviour of programs. We use a model of a program that represents the data dependencies in a fault-free trace of the program and we analyze this model to glean some information about the length of intermittent faults and their effect on the program under specific fault and crash models. The results of our study can aid fault detection, diagnosis and recovery techniques.
Keywords :
fault diagnosis; fault tolerant computing; program diagnostics; CPU cycle; circuit wear out; data dependency; fault detection; fault diagnosis; fault free trace; intermittent hardware fault; process variation; recovery technique; voltage fluctuation; Circuit faults; Clocks; Computer crashes; Computer errors; Fault detection; Fault diagnosis; Hardware; Information analysis; Temperature; Voltage fluctuations;
Conference_Titel :
Dependable Systems and Networks Workshops (DSN-W), 2010 International Conference on
Conference_Location :
Chicago, IL
Print_ISBN :
978-1-4244-7729-6
Electronic_ISBN :
978-1-4244-7728-9
DOI :
10.1109/DSNW.2010.5542613