• DocumentCode
    688408
  • Title

    Device View Redundancy: An Adaptive Low-Overhead Fault Tolerance Mechanism for Many-Core System

  • Author

    Wentao Jia ; Chunyuan Zhang ; Jian Fu

  • Author_Institution
    Nat. Key Lab. of Parallel & Distrib. Process., Nat. Univ. of Defense Technol., Changsha, China
  • fYear
    2013
  • fDate
    13-15 Nov. 2013
  • Firstpage
    2080
  • Lastpage
    2087
  • Abstract
    Continued increasing of fault rate in integrate circuit makes processors more susceptible to errors, especially many-core processor. Meanwhile, most systems or applications do not need full fault coverage, which has excessive overhead. So on-demand fault tolerance is desired for these applications. In this paper, we propose an adaptive low-overhead fault tolerance mechanism for many-core system, called Device View Redundancy(DVR). It treats fault tolerance as a device that can be configured and used by application when high reliability is needed. Nevertheless, DVR exploits the idle resources for low overhead fault tolerance, which is based on the observation that the utilization of many-core system is low due to lack of parallelism in application. Finally, the experiment shows that the performance overhead of DVR is reduced by 16% to 98% compared with full Dual Modular Redundancy(DMR).
  • Keywords
    fault tolerant computing; multiprocessing systems; DVR; adaptive low-overhead fault tolerance mechanism; device view redundancy; many-core processor; many-core system; on-demand fault tolerance; Fault tolerant systems; Hardware; Performance evaluation; Redundancy; Registers; dynamic coupling; idle resource exploitation; low-overhead; many core system; on-demand redundancy;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC), 2013 IEEE 10th International Conference on
  • Conference_Location
    Zhangjiajie
  • Type

    conf

  • DOI
    10.1109/HPCC.and.EUC.2013.299
  • Filename
    6832182