DocumentCode
2679435
Title
Assuring application-level correctness against soft errors
Author
Cong, Jason ; Gururaj, Karthik
Author_Institution
Dept. of Comput. Sci., Univ. of California, Los Angeles, CA, USA
fYear
2011
fDate
7-10 Nov. 2011
Firstpage
150
Lastpage
157
Abstract
Traditionally, research in fault tolerance has required architectural state to be numerically perfect for program execution to be correct. However, in many programs, even if execution is not 100% numerically correct, the program can still appear to execute correctly from the user´s perspective. To quantify user satisfaction, application-level fidelity metrics (such as PSNR) can be used. The output for such applications is defined to be correct if the fidelity metrics satisfy a certain threshold. However, such applications still contain instructions whose outputs are critical - i.e. their correctness decides if the overall quality of the program output is acceptable. In this paper, we present an analysis technique for identifying such critical program segments. More importantly, our technique is capable of guaranteeing application-level correctness through a combination of static analysis and runtime monitoring. Our static analysis consists of data flow analysis followed by control flow analysis to find static critical instructions which affect several instructions. Critical instructions are further refined into likely non-critical and likely critical sets in a profiling phase. At runtime, we use a monitoring scheme to monitor likely non-critical instructions and take remedial actions if some likely non-critical instructions become critical. Based on this analysis, we minimize the number of instructions that are duplicated and checked at runtime using a software-based fault detection and recovery technique [20]. Put together, our approach can lead to 22% average energy savings for multimedia applications while guaranteeing application-level correctness, when compared to a recent work [9], which cannot guarantee application-level correctness. Comparing to the approach proposed in [20] which guarantees both application-level and numerical correctness, our method achieves 79% energy reduction.
Keywords
data flow analysis; software fault tolerance; application-level correctness assurance; application-level fidelity metrics; control flow analysis; data flow analysis; fault tolerance; program execution; program segment identification; runtime monitoring; soft errors; software-based fault detection; software-based fault recovery technique; static analysis; user satisfaction; Arrays; Handheld computers; Indexes; Measurement; Monitoring; Runtime; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer-Aided Design (ICCAD), 2011 IEEE/ACM International Conference on
Conference_Location
San Jose, CA
ISSN
1092-3152
Print_ISBN
978-1-4577-1399-6
Electronic_ISBN
1092-3152
Type
conf
DOI
10.1109/ICCAD.2011.6105319
Filename
6105319
Link To Document