DocumentCode :
3479616
Title :
Software Modification Aided Transient Error Tolerance for Embedded Systems
Author :
Shafik, Rishad Ahmed ; Rauwerda, Gerard ; Potman, Jordy ; Sunesen, Kim ; Pradhan, Dhiraj ; Mathew, Jinesh ; Sourdis, Ioannis
Author_Institution :
Univ. of Bristol, Bristol, UK
fYear :
2013
fDate :
4-6 Sept. 2013
Firstpage :
219
Lastpage :
226
Abstract :
Commercial off-the-shelf (COTS) components are increasingly being employed in embedded systems due to their high performance at low cost. With emerging reliability requirements, design of these components using traditional hardware redundancy incur large overheads, time-demanding re-design and validation. To reduce the design time with shorter time-to-market requirements, software-only reliable design techniques can provide with an effective and low-cost alternative. This paper presents a novel, architecture-independent software modification tool, SMART (Software Modification Aided transient eRror Tolerance) for effective error detection and tolerance. To detect transient errors in processor data path, control flow and memory at reasonable system overheads, the tool incorporates selective and non-intrusive data duplication and dynamic signature comparison. Also, to mitigate the impact of the detected errors, it facilitates further software modification implementing software-based check-pointing. Due to automatic software based source-to-source modification tailored to a given reliability requirement, the tool requires no re-design effort, hardware- or compiler-level intervention. We evaluate the effectiveness of the tool using a Xentium processor based system as a case study of COTS based systems. Using various benchmark applications with single-event upset (SEUs) based error model, we show that up to 91% of the errors can be detected or masked with reasonable performance, energy and memory footprint overheads.
Keywords :
checkpointing; embedded systems; error detection; software architecture; software fault tolerance; time to market; COTS based systems; COTS components; SMART; Xentium processor based system; architecture-independent software modification tool; automatic software based source-to-source modification; commercial off-the-shelf component; control flow; design time reduction; dynamic signature; embedded systems; energy overhead; memory footprint overhead; nonintrusive data duplication; processor datapath; reliability requirement; selective data duplication; single-event upset based error model; software modification aided transient error tolerance; software-based check-pointing; software-only reliable design technique; time-to-market requirements; transient error detection; Computer architecture; Hardware; Libraries; Registers; Reliability; Software; Transient analysis; Embedded Systems; Error Detection; Fault Tolerance; Reliable Computing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Digital System Design (DSD), 2013 Euromicro Conference on
Conference_Location :
Los Alamitos, CA
Type :
conf
DOI :
10.1109/DSD.2013.32
Filename :
6628280
Link To Document :
بازگشت