DocumentCode :
2866825
Title :
Data Flow Error Recovery with Checkpointing and Instruction-Level Fault Tolerance
Author :
Xiong, Lei ; Tan, Qingping
Author_Institution :
Sch. of Comput., Nat. Univ. of Defense Technol., Changsha, China
fYear :
2011
fDate :
20-22 Oct. 2011
Firstpage :
79
Lastpage :
85
Abstract :
Soft error detection and recovery are important to the system reliability, especially for the improvement of fabrication technology. Instruction-level soft error tolerance method which needs not additional hardware is broadly discussed. This paper proposes an application level data flow error recovery approach which combines the technique check pointing with instruction level fault tolerance method. On the instruction level, those codes are divided into protected codes and unprotected codes based on their sensibility to soft errors on hardware. For those protected codes, every data is copied with two versions. At some program points such as store instruction and branch instruction in the program, these related data are checked. If the two version data are not identical, we consider that there is a happened soft error. Then the program state is restored from a prior check point which is related to the error data. For a checked data, the check point which is related to the data is saved based on the program slice whose original program is from the beginning of the program to the checked data. Finally, the approach is implemented in our experiments, and experimental results demonstrate our approach.
Keywords :
checkpointing; codes; data flow analysis; program slicing; software fault tolerance; application level data flow error recovery; branch instruction; checkpointing technique; fabrication technology; instruction-level soft error tolerance method; program slice; protected codes; soft error detection; soft error recovery; store instruction; system reliability; unprotected codes based; Checkpointing; Fault tolerance; Fault tolerant systems; Flow graphs; Hardware; Software; Software algorithms; checkpointing; data flow error; fault recovery; instruction-level fault tolerance; soft errors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2011 12th International Conference on
Conference_Location :
Gwangju
Print_ISBN :
978-1-4577-1807-6
Type :
conf
DOI :
10.1109/PDCAT.2011.33
Filename :
6118959
Link To Document :
بازگشت