• DocumentCode
    1653918
  • Title

    An adaptive approach for online fault management in many-core architectures

  • Author

    Bolchini, Cristiana ; Miele, Antonio ; Sciuto, Donatella

  • Author_Institution
    Dip. Elettron. e Inf., Politec. di Milano, Milan, Italy
  • fYear
    2012
  • Firstpage
    1429
  • Lastpage
    1432
  • Abstract
    This paper presents a dynamic scheduling solution to achieve fault tolerance in many-core architectures. Triple Modular Redundancy is applied on the multi-threaded application to dynamically mitigate the effects of both permanent and transient faults, and to identify and isolate damaged units. The approach targets the best performance, while balancing the use of the healthy resources to limit wear-out and aging effects, which cause permanent damages. Experimental results on synthetic case studies are reported, to validate the ability to tolerate faults while optimizing performance and resource usage.
  • Keywords
    ageing; dynamic scheduling; fault diagnosis; fault tolerance; microprocessor chips; multiprocessing systems; processor scheduling; wear; adaptive approach; aging effect; damaged unit identification; damaged unit isolation; dynamic scheduling solution; fault tolerance; many-core architecture; multithreaded application; online fault management; permanent fault mitigation; transient fault mitigation; triple modular redundancy; wear-out effect; Computer architecture; Fault tolerance; Fault tolerant systems; Hardware; Instruction sets; Synchronization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Design, Automation & Test in Europe Conference & Exhibition (DATE), 2012
  • Conference_Location
    Dresden
  • ISSN
    1530-1591
  • Print_ISBN
    978-1-4577-2145-8
  • Type

    conf

  • DOI
    10.1109/DATE.2012.6176589
  • Filename
    6176589