• DocumentCode
    3273274
  • Title

    A fine-grained link-level fault-tolerant mechanism for networks-on-chip

  • Author

    Vitkovskiy, Arseniy ; Soteriou, Vassos ; Nicopoulos, Chrysostomos

  • Author_Institution
    Dept. of Electr. Eng. & Inf. Technol., Cyprus Univ. of Technol., Limassol, Cyprus
  • fYear
    2010
  • fDate
    3-6 Oct. 2010
  • Firstpage
    447
  • Lastpage
    454
  • Abstract
    Silicon technology scaling is continuously enabling denser integration capabilities. However, this comes at the expense of higher variability and susceptibility to wear-out. With an escalating number of on-chip components expected to be defective in near-future chips, modern parallel systems, such as Chip Multi-Processors (CMP), become especially vulnerable to these faults. Just a single link failure in the underlying Network on-Chip (NoC) may cause inter-tile communication to halt and even deadlock, rendering the chip useless. While fault-tolerant routing schemes do exist, they can only handle a finite number of link faults. In this paper, we address permanent wire failures which can occur in on-chip parallel links at manufacture-time or while in operation. Instead of marking the entire link as faulty, we present a methodology where the Partially Faulty Link (PFL) can still be used to transfer data between NoC routers, thus maintaining network connectivity, extending the yield and lifetime of the chip, and allowing for graceful performance degradation. To achieve this, we devise architectural augmentations both to the router and link micro-architectures, along with link fault detection, diagnosis, and re-configuration at the level of wire granularity. Statistical link-level fault models present the usability of PFLs, while relevant load-balancing routing algorithms and low-cost re-transmission mechanisms are also presented and coupled to the proposed architecture. Hardware synthesis demonstrates the feasibility of the proposed extensions to the base NoC architecture. Results obtained from full-system simulations show that high-performance NoCs are realizable in the presence of PFLs.
  • Keywords
    fault tolerant computing; microprocessor chips; multiprocessing systems; network routing; network-on-chip; resource allocation; chip multiprocessors; fault-tolerant routing schemes; fine-grained link-level fault-tolerant mechanism; link fault detection; link fault diagnosis; link microarchitectures; load-balancing routing algorithms; low-cost retransmission mechanisms; modern parallel systems; network connectivity; networks-on-chip; on-chip components; on-chip parallel links; partially faulty link; statistical link-level fault present; wire granularity; Computer architecture; Decoding; Hardware; Receivers; Routing; System-on-a-chip; Wire;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Design (ICCD), 2010 IEEE International Conference on
  • Conference_Location
    Amsterdam
  • ISSN
    1063-6404
  • Print_ISBN
    978-1-4244-8936-7
  • Type

    conf

  • DOI
    10.1109/ICCD.2010.5647663
  • Filename
    5647663