• DocumentCode
    2679435
  • Title

    Assuring application-level correctness against soft errors

  • Author

    Cong, Jason ; Gururaj, Karthik

  • Author_Institution
    Dept. of Comput. Sci., Univ. of California, Los Angeles, CA, USA
  • fYear
    2011
  • fDate
    7-10 Nov. 2011
  • Firstpage
    150
  • Lastpage
    157
  • Abstract
    Traditionally, research in fault tolerance has required architectural state to be numerically perfect for program execution to be correct. However, in many programs, even if execution is not 100% numerically correct, the program can still appear to execute correctly from the user´s perspective. To quantify user satisfaction, application-level fidelity metrics (such as PSNR) can be used. The output for such applications is defined to be correct if the fidelity metrics satisfy a certain threshold. However, such applications still contain instructions whose outputs are critical - i.e. their correctness decides if the overall quality of the program output is acceptable. In this paper, we present an analysis technique for identifying such critical program segments. More importantly, our technique is capable of guaranteeing application-level correctness through a combination of static analysis and runtime monitoring. Our static analysis consists of data flow analysis followed by control flow analysis to find static critical instructions which affect several instructions. Critical instructions are further refined into likely non-critical and likely critical sets in a profiling phase. At runtime, we use a monitoring scheme to monitor likely non-critical instructions and take remedial actions if some likely non-critical instructions become critical. Based on this analysis, we minimize the number of instructions that are duplicated and checked at runtime using a software-based fault detection and recovery technique [20]. Put together, our approach can lead to 22% average energy savings for multimedia applications while guaranteeing application-level correctness, when compared to a recent work [9], which cannot guarantee application-level correctness. Comparing to the approach proposed in [20] which guarantees both application-level and numerical correctness, our method achieves 79% energy reduction.
  • Keywords
    data flow analysis; software fault tolerance; application-level correctness assurance; application-level fidelity metrics; control flow analysis; data flow analysis; fault tolerance; program execution; program segment identification; runtime monitoring; soft errors; software-based fault detection; software-based fault recovery technique; static analysis; user satisfaction; Arrays; Handheld computers; Indexes; Measurement; Monitoring; Runtime; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer-Aided Design (ICCAD), 2011 IEEE/ACM International Conference on
  • Conference_Location
    San Jose, CA
  • ISSN
    1092-3152
  • Print_ISBN
    978-1-4577-1399-6
  • Electronic_ISBN
    1092-3152
  • Type

    conf

  • DOI
    10.1109/ICCAD.2011.6105319
  • Filename
    6105319