Title :
Overhead and reliability analysis of algorithm-based fault tolerance in FPGA systems
Author :
Jacobs, Adam ; Cieslewski, Grzegorz ; George, Alan D.
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Florida, Gainesville, FL, USA
Abstract :
Commercial SRAM-based, field-programmable gate arrays (FPGAs) have the capability to provide space applications with the necessary performance, energy-efficiency, and adaptability to meet next-generation mission requirements. However, mitigating an FPGA´s susceptibility to radiation-induced faults is challenging. Triple-modular redundancy (TMR) techniques are traditionally used to mitigate radiation effects, but TMR incurs substantial overheads such as increased area and power requirements. In order to reduce these overheads while still providing sufficient radiation mitigation, we propose the use of algorithm-based fault tolerance (ABFT). We investigate the effectiveness of hardware-based ABFT logic in COTS FPGAs by developing multiple ABFT-enabled matrix multiplication designs, carefully analyzing resource usage and reliability tradeoffs, and proposing design modifications for higher reliability. We perform fault-injection testing on a Xilinx Virtex-5 platform to validate these ABFT designs, measure design vulnerability, and compare ABFT effectiveness to other fault-tolerance methods. Our hybrid ABFT design reduces total design vulnerability by 99% while only incurring 25% overhead over a baseline, non-protected design.
Keywords :
SRAM chips; aerospace computing; fault tolerant computing; field programmable gate arrays; matrix multiplication; resource allocation; ABFT-enabled matrix multiplication; COTS FPGA; SRAM-based FPGA system; TMR technique; Xilinx Virtex-5 platform; algorithm-based fault tolerance; design vulnerability; fault-injection testing; field programmable gate array; hardware-based ABFT logic; next-generation mission requirement; overhead analysis; radiation-induced fault; reliability analysis; reliability tradeoff; resource usage; space application; static random access memory; triple-modular redundancy; Computer architecture; Fault tolerance; Fault tolerant systems; Field programmable gate arrays; Random access memory; Tunneling magnetoresistance;
Conference_Titel :
Field Programmable Logic and Applications (FPL), 2012 22nd International Conference on
Conference_Location :
Oslo
Print_ISBN :
978-1-4673-2257-7
Electronic_ISBN :
978-1-4673-2255-3
DOI :
10.1109/FPL.2012.6339222