Title :
A comparison of algorithm-based fault tolerance and traditional redundant self-checking for SEU mitigation
Author :
Samson, John R., Jr. ; DeLa Torre, Lou ; Wiley, Paris ; Stottlar, Thomas ; Ring, Jeff
Author_Institution :
Honeywell Space Syst., Clearwater, FL, USA
Abstract :
The use of an algorithmic, checksum-based "EDAC" (error detection and correction) technique for matrix multiply operations is compared with the more traditional redundant self-checking hardware and retry approach for mitigating single event upset (SEU) or transient errors in soft, radiation tolerant signal processing hardware. Compared with the self-checking approach, the check-sum based EDAC technique offers a number of advantages including lower size, weight, power, and cost. In a manner similar to the SECDED (single error correction/double error detection) EDAC technique commonly used on memory systems, the checksum-based technique can detect and correct errors on the same processing cycle, reducing transient error recovery latency and significantly improving system availability. The paper compares the checksum-based technique with the self-checking technique in terms of failure rates; upset rates coverage, percentage overhead, detection latency, recovery latency, size, weight, power, and cost. The paper also looks at the percentage overhead of the checksum-based technique, which decreases as the size of the matrix increases
Keywords :
error correction; error detection; failure analysis; fault tolerant computing; matrix multiplication; radiation effects; radiation hardening (electronics); redundancy; SEU mitigation; algorithm-based fault tolerance; checksum-based EDAC technique; detection latency; error recovery latency; failure rates; matrix multiply operations; percentage overhead; processing cycle; radiation tolerant signal processing hardware; recovery latency; redundant self-checking; single event upset; system availability; transient errors; upset rates coverage; Application software; Control systems; Costs; Delay; Discrete Fourier transforms; Error correction; Fault tolerance; Hardware; Signal processing algorithms; Single event upset;
Conference_Titel :
Digital Avionics Systems, 2001. DASC. 20th Conference
Conference_Location :
Daytona Beach, FL
Print_ISBN :
0-7803-7034-1
DOI :
10.1109/DASC.2001.964242