DocumentCode :
3663964
Title :
FaultHound: Value-locality-based soft-fault tolerance
Author :
Nitin;Irith Pomeranz;T. N. Vijaykumar
Author_Institution :
School of Electrical and Computer Engineering, Purdue University, USA
fYear :
2015
fDate :
6/1/2015 12:00:00 AM
Firstpage :
668
Lastpage :
681
Abstract :
Soft error susceptibility is a growing concern with continued CMOS scaling. Previous work explores full- and partial-redundancy schemes in hardware and software for soft-fault tolerance. However, full-redundancy schemes incur high performance and energy overheads whereas partial-redundancy schemes achieve low coverage. An initial study, called Perturbation Based Fault Screening (PBFS), explores exploiting value locality to provide hints of soft faults whenever a value falls outside its neighborhood. PBFS employs bit-mask filters to capture value neighborhoods. However, PBFS achieves low coverage; straightforwardly improving the coverage results in high false-positive rates, and performance and energy overheads. We propose FaultHound, a value-locality-based soft-fault tolerance scheme, which employs five mechanisms to address PBFS´s limitations: (1) a scheme to cluster the filters via an inverted organization of the filter tables to reinforce learning and reduce the false-positive rates; (2) a learning scheme for ignoring the delinquent bit positions that raise repeated false alarms, to reduce further the false-positive rate; (3) a light-weight predecessor replay scheme instead of a full rollback to reduce the performance and energy penalty of the remaining false positives; (4) a simple scheme to distinguish rename faults, which require rollback instead of replay for recovery, from false positives to avoid unnecessary rollback penalty; and (5) a detection scheme, which avoids rollback, for the load-store queue which is not covered by our replay. Using simulations, we show that while PBFS achieves either low coverage (30%), or high false-positive rates (8%) with high performance overheads (97%), FaultHound achieves higher coverage (75%) and lower false-positive rates (3%) with lower performance and energy overheads (10% and 25%).
Keywords :
"Matched filters","Pipelines"
Publisher :
ieee
Conference_Titel :
Computer Architecture (ISCA), 2015 ACM/IEEE 42nd Annual International Symposium on
Type :
conf
DOI :
10.1145/2749469.2750372
Filename :
7284103
Link To Document :
بازگشت