• DocumentCode
    2150380
  • Title

    Modeling and analysis of fault-tolerant distributed memories for Networks-on-Chip

  • Author

    BanaiyanMofrad, Abbas ; Dutt, Nikil ; Girao, Gustavo

  • Author_Institution
    Center for Embedded Computer Systems, University of California, Irvine, USA
  • fYear
    2013
  • fDate
    18-22 March 2013
  • Firstpage
    1605
  • Lastpage
    1608
  • Abstract
    Advances in technology scaling increasingly make Network-on-Chips (NoCs) more susceptible to failures that cause various reliability challenges. With increasing area occupied by different on-chip memories, strategies for maintaining fault-tolerance of distributed on-chip memories become a major design challenge. We propose a system-level design methodology for scalable fault-tolerance of distributed on-chip memories in NoCs. We introduce a novel reliability clustering model for fault-tolerance analysis and shared redundancy management of on-chip memory blocks. We perform extensive design space exploration applying the proposed reliability clustering on a block-redundancy fault-tolerant scheme to evaluate the tradeoffs between reliability, performance, and overheads. Evaluations on a 64-core chip multiprocessor (CMP) with an 8x8 mesh NoC show that distinct strategies of our case study may yield up to 20% improvements in performance gains and 25% improvement in energy savings across different benchmarks, and uncover interesting design configurations.
  • Keywords
    Analytical models; Fault tolerant systems; Redundancy; Reliability engineering; System-on-chip;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013
  • Conference_Location
    Grenoble, France
  • ISSN
    1530-1591
  • Print_ISBN
    978-1-4673-5071-6
  • Type

    conf

  • DOI
    10.7873/DATE.2013.326
  • Filename
    6513772