• DocumentCode
    576922
  • Title

    Transient-Error Detection and Recovery via Reverse Computation and Checkpointing

  • Author

    Tan, Lanfang ; Tan, Qingping ; Xu, Jianjun ; Li, Jianli

  • Author_Institution
    Comput. Sch., Nat. Univ. of Defense Technol. Changsha, Changsha, China
  • fYear
    2012
  • fDate
    24-28 Sept. 2012
  • Firstpage
    170
  • Lastpage
    178
  • Abstract
    The integration of error detection and recovery mechanisms becomes mandatory as the probability of the occurrence of transient errors increases. The current study proposes a software-based fault tolerant technique that achieves both detection and recovery. The proposed technique is based on two main mechanisms, namely, reverse computation and check pointing. This study is the first to introduce reverse computation for error detection by comparing the input data of the original computation and the output data of the reverse computation. Live variable analysis is introduced to reduce the overhead of the check pointing technique. A translation tool is implemented to make the original source code fault tolerant with automatic error detection and recovery abilities. Fault injection and performance overhead experiments are performed to evaluate the proposed technique. Experimental results show that most errors can be recovered with relatively low performance overhead.
  • Keywords
    checkpointing; software fault tolerance; software performance evaluation; checkpointing; fault injection; live variable analysis; performance overhead experiments; reverse computation; software-based fault tolerant technique; transient-error detection; transient-error recovery; translation tool; Checkpointing; Fault tolerance; Fault tolerant systems; Instruments; Registers; Transient analysis; checkpointing; checksum; error recovery; reverse computation; software fault tolerance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing Workshops (CLUSTER WORKSHOPS), 2012 IEEE International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4673-2893-7
  • Type

    conf

  • DOI
    10.1109/ClusterW.2012.33
  • Filename
    6355861