• DocumentCode
    2344424
  • Title

    Adaptive execution assistance for multiplexed fault-tolerant chip multiprocessors

  • Author

    Subramanyan, Pramod ; Singh, Virendra ; Saluja, Kewal K. ; Larsson, Erik

  • Author_Institution
    Princeton Univ., Princeton, NJ, USA
  • fYear
    2011
  • fDate
    9-12 Oct. 2011
  • Firstpage
    419
  • Lastpage
    426
  • Abstract
    Relentless scaling of CMOS fabrication technology has made contemporary integrated circuits increasingly susceptible to transient faults, wearout-related permanent faults, intermittent faults and process variations. Therefore, mechanisms to mitigate the effects of decreased reliability are expected to become essential components of future general-purpose microprocessors. In this paper, we introduce a new throughput-efficient architecture for multiplexed fault-tolerant chip multiprocessors (CMPs). Our proposal relies on the new technique of adaptive execution assistance, which dynamically varies instruction outcomes forwarded from the leading core to the trailing core based on measures of trailing core performance. We identify policies and design low overhead hardware mechanisms to achieve this. Our work also introduces a new priority-based thread-scheduling algorithm for multiplexed architectures that improves multiplexed fault tolerant CMP throughput by prioritizing stalled threads. Through simulation-based evaluation, we And that our proposal delivers 17.2% higher throughput than perfect dual modular redundant (DMR) execution and outperforms previous proposals for throughput-efficient CMP architectures.
  • Keywords
    CMOS integrated circuits; circuit simulation; computer architecture; fault tolerant computing; integrated circuit reliability; microprocessor chips; multi-threading; CMOS fabrication technology; CMP; adaptive execution assistance technique; integrated circuit; intermittent fault; low overhead hardware mechanism; multiplexed fault tolerant chip multiprocessors; priority based thread scheduling algorithm; simulation based evaluation; throughput-efficient architecture; transient fault; wearout related permanent fault; Lead; Multiplexing; Reliability; Throughput;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Design (ICCD), 2011 IEEE 29th International Conference on
  • Conference_Location
    Amherst, MA
  • ISSN
    1063-6404
  • Print_ISBN
    978-1-4577-1953-0
  • Type

    conf

  • DOI
    10.1109/ICCD.2011.6081432
  • Filename
    6081432