DocumentCode
854280
Title
Error detection by selective procedure call duplication for low energy consumption
Author
Oh, Nahmsuk ; McCluskey, Edward J.
Author_Institution
Synopsys Inc., Mountain View, CA, USA
Volume
51
Issue
4
fYear
2002
fDate
12/1/2002 12:00:00 AM
Firstpage
392
Lastpage
402
Abstract
As commercial off-the-shelf (COTS) components are used in system-on-chip (SoC) design technique that is widely used from cellular phones to personal computers, it is difficult to modify hardware design to implement hardware fault-tolerant techniques and improve system reliability. Two major concerns of this paper are to: (a) improve system reliability by detecting transient errors in hardware, and (b) reduce energy consumption by minimizing error-detection overhead. The objective of this new technique, selective procedure call duplication (SPCD) is to keep the system fault-secured (preserve data integrity) in the presence of transient errors, with minimum additional energy consumption. The basic approach is to duplicate computations and then to compare their results to detect errors. There are 3 choices for duplicate computation: (1) duplicating every statement in the program and comparing results, (2) re-executing procedures through duplicated procedure calls, and comparing results, and (3) re-executing the whole program, and comparing the final results. SPDC combines choices (1) and(2). For a given program, SPCD analyzes procedure-call behavior of the program, and then determines which procedures can have duplicated statements [choice(1)] and which procedure calls can be duplicated [choice (2)] to minimize energy consumption with reasonable error-detection latency. Then, SPCD transforms the original program into a new program that can detect errors with minimum additional energy consumption by re-executing the statements or procedures. SPCD was simulated with benchmark programs; it requires less than 25% additional energy for error detection than previous techniques that do not consider energy consumption.
Keywords
error detection; fault tolerant computing; power consumption; software fault tolerance; commercial off-the-shelf components; data integrity; energy consumption reduction; error detection; error-detection latency; error-detection overhead minimisation; fault tolerance; hardware fault-tolerant techniques; instruction duplication; low energy consumption; low energy technique; low power technique; procedure cloning; selective procedure call duplication; software error detection; system reliability improvement; system-on-chip design technique; transient errors detection; Computer errors; Electromagnetic radiation; Energy consumption; Error correction codes; Fault tolerance; Hardware; Redundancy; Reliability; Satellites; Single event upset;
fLanguage
English
Journal_Title
Reliability, IEEE Transactions on
Publisher
ieee
ISSN
0018-9529
Type
jour
DOI
10.1109/TR.2002.804735
Filename
1044337
Link To Document