DocumentCode
1947516
Title
Overhead and reliability analysis of algorithm-based fault tolerance in FPGA systems
Author
Jacobs, Adam ; Cieslewski, Grzegorz ; George, Alan D.
Author_Institution
Dept. of Electr. & Comput. Eng., Univ. of Florida, Gainesville, FL, USA
fYear
2012
fDate
29-31 Aug. 2012
Firstpage
300
Lastpage
306
Abstract
Commercial SRAM-based, field-programmable gate arrays (FPGAs) have the capability to provide space applications with the necessary performance, energy-efficiency, and adaptability to meet next-generation mission requirements. However, mitigating an FPGA´s susceptibility to radiation-induced faults is challenging. Triple-modular redundancy (TMR) techniques are traditionally used to mitigate radiation effects, but TMR incurs substantial overheads such as increased area and power requirements. In order to reduce these overheads while still providing sufficient radiation mitigation, we propose the use of algorithm-based fault tolerance (ABFT). We investigate the effectiveness of hardware-based ABFT logic in COTS FPGAs by developing multiple ABFT-enabled matrix multiplication designs, carefully analyzing resource usage and reliability tradeoffs, and proposing design modifications for higher reliability. We perform fault-injection testing on a Xilinx Virtex-5 platform to validate these ABFT designs, measure design vulnerability, and compare ABFT effectiveness to other fault-tolerance methods. Our hybrid ABFT design reduces total design vulnerability by 99% while only incurring 25% overhead over a baseline, non-protected design.
Keywords
SRAM chips; aerospace computing; fault tolerant computing; field programmable gate arrays; matrix multiplication; resource allocation; ABFT-enabled matrix multiplication; COTS FPGA; SRAM-based FPGA system; TMR technique; Xilinx Virtex-5 platform; algorithm-based fault tolerance; design vulnerability; fault-injection testing; field programmable gate array; hardware-based ABFT logic; next-generation mission requirement; overhead analysis; radiation-induced fault; reliability analysis; reliability tradeoff; resource usage; space application; static random access memory; triple-modular redundancy; Computer architecture; Fault tolerance; Fault tolerant systems; Field programmable gate arrays; Random access memory; Tunneling magnetoresistance;
fLanguage
English
Publisher
ieee
Conference_Titel
Field Programmable Logic and Applications (FPL), 2012 22nd International Conference on
Conference_Location
Oslo
Print_ISBN
978-1-4673-2257-7
Electronic_ISBN
978-1-4673-2255-3
Type
conf
DOI
10.1109/FPL.2012.6339222
Filename
6339222
Link To Document