Title :
HHC: Hierarchical hardware checkpointing to accelerate fault recovery for SRAM-based FPGAs
Author :
Enshan Yang ; Keheng Huang ; Yu Hu ; Xiaowei Li ; Jian Gong ; Hongjin Liu ; Bo Liu
Author_Institution :
State Key Lab. of Comput. Archit., Inst. of Comput. Technol., China
Abstract :
As the feature size shrinks to the nanometer scale, SRAM-based FPGAs are increasingly vulnerable to soft errors. Checkpointing is an effective fault recovery technique that can restore the faulty system to its previous fault free state. Since the function of the system needs to be suspended during checkpoint saving and checkpoint restoring, so the Mean Time to Repair (MTTR) of the system is critical to the system performance. In this work, we propose a hierarchical hardware checkpointing (HHC) technique that contains a high-speed on-chip checkpoint and a low-speed off-chip checkpoint to accelerate fault recovery for SRAM-based FPGAs. Most of single event effect (SEE) faults can be recovered by the high-speed on-chip checkpoint, which significantly reduces the MTTR of the system. The memory resource occupation of the on-chip checkpoint is low because HHC only stores the logic states of user bits and check information for configuration bits. Experimental results show that, compared with traditional off-chip checkpoint strategies, the proposed technique can reduce the MTTR of the system by 94.30%. In addition, the memory resource occupation is 11.11% of FPGAs, a little high but can be further optimized.
Keywords :
SRAM chips; checkpointing; field programmable gate arrays; HHC; MTTR; SRAM-based FPGA; checkpoint restoring; checkpoint saving; fault recovery technique; hierarchical hardware checkpointing technique; high-speed on-chip checkpoint; low-speed off-chip checkpoint; mean time to repair; single event effect faults; soft errors; Bandwidth; Checkpointing; Circuit faults; Error correction codes; Field programmable gate arrays; Hardware; System-on-chip; ECC; MTTR; SRAM-based FPGAs; fault recovery; hardware checkpoint; hierarchical;
Conference_Titel :
On-Line Testing Symposium (IOLTS), 2013 IEEE 19th International
Conference_Location :
Chania
DOI :
10.1109/IOLTS.2013.6604078