• DocumentCode
    1947516
  • Title

    Overhead and reliability analysis of algorithm-based fault tolerance in FPGA systems

  • Author

    Jacobs, Adam ; Cieslewski, Grzegorz ; George, Alan D.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Florida, Gainesville, FL, USA
  • fYear
    2012
  • fDate
    29-31 Aug. 2012
  • Firstpage
    300
  • Lastpage
    306
  • Abstract
    Commercial SRAM-based, field-programmable gate arrays (FPGAs) have the capability to provide space applications with the necessary performance, energy-efficiency, and adaptability to meet next-generation mission requirements. However, mitigating an FPGA´s susceptibility to radiation-induced faults is challenging. Triple-modular redundancy (TMR) techniques are traditionally used to mitigate radiation effects, but TMR incurs substantial overheads such as increased area and power requirements. In order to reduce these overheads while still providing sufficient radiation mitigation, we propose the use of algorithm-based fault tolerance (ABFT). We investigate the effectiveness of hardware-based ABFT logic in COTS FPGAs by developing multiple ABFT-enabled matrix multiplication designs, carefully analyzing resource usage and reliability tradeoffs, and proposing design modifications for higher reliability. We perform fault-injection testing on a Xilinx Virtex-5 platform to validate these ABFT designs, measure design vulnerability, and compare ABFT effectiveness to other fault-tolerance methods. Our hybrid ABFT design reduces total design vulnerability by 99% while only incurring 25% overhead over a baseline, non-protected design.
  • Keywords
    SRAM chips; aerospace computing; fault tolerant computing; field programmable gate arrays; matrix multiplication; resource allocation; ABFT-enabled matrix multiplication; COTS FPGA; SRAM-based FPGA system; TMR technique; Xilinx Virtex-5 platform; algorithm-based fault tolerance; design vulnerability; fault-injection testing; field programmable gate array; hardware-based ABFT logic; next-generation mission requirement; overhead analysis; radiation-induced fault; reliability analysis; reliability tradeoff; resource usage; space application; static random access memory; triple-modular redundancy; Computer architecture; Fault tolerance; Fault tolerant systems; Field programmable gate arrays; Random access memory; Tunneling magnetoresistance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Field Programmable Logic and Applications (FPL), 2012 22nd International Conference on
  • Conference_Location
    Oslo
  • Print_ISBN
    978-1-4673-2257-7
  • Electronic_ISBN
    978-1-4673-2255-3
  • Type

    conf

  • DOI
    10.1109/FPL.2012.6339222
  • Filename
    6339222