• DocumentCode
    174678
  • Title

    Hermes: Architecting a top-performing fault-tolerant routing algorithm for Networks-on-Chips

  • Author

    Iordanou, C. ; Soteriou, V. ; Aisopos, K.

  • Author_Institution
    Dept. of Electr. Eng., Comput. Eng. & Inf., Cyprus Univ. of Technol., Limassol, Cyprus
  • fYear
    2014
  • fDate
    19-22 Oct. 2014
  • Firstpage
    424
  • Lastpage
    431
  • Abstract
    Networks-on-Chips (NoCs) are experiencing escalating susceptibility to wear-out and reduced reliability, with the risk of becoming the key point of failure in an entire multicore chip. Aiming towards seamless NoC operation in the presence of faulty communication links, in this paper we propose Hermes, a highly-robust, distributed and lightweight fault-tolerant routing algorithm, whose performance degrades gracefully with increasing faulty link counts. Hermes is a deadlock-free hybrid routing algorithm, utilizing load-balancing routing on fault-free paths to sustain high-performance, while providing pre-reconfigured escape path selection in the vicinity of faults. Additionally, Hermes identifies non-communicating network partitions in scenarios where faulty links are topologically densely distributed. An extensive experimental evaluation, including utilizing traffic benchmarks gathered from full-system chip multi-processor simulations, shows that Hermes improves network throughput by up to 3× when compared against prior-art.
  • Keywords
    fault tolerance; integrated circuit reliability; multiprocessing systems; network routing; network-on-chip; resource allocation; Hermes; NoC; deadlock-free hybrid routing algorithm; fault-free paths; fault-tolerant routing algorithm; faulty communication links; faulty link counts; full-system chip multiprocessor simulations; load-balancing routing; multicore chip; networks-on-chips; prereconfigured escape path selection; wearout; Broadcasting; Multicore processing; Ports (Computers); Registers; Routing; System recovery; Topology; Network-on-chip; chip multi-processor; fault-tolerance; reliability; routing algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Design (ICCD), 2014 32nd IEEE International Conference on
  • Conference_Location
    Seoul
  • Type

    conf

  • DOI
    10.1109/ICCD.2014.6974715
  • Filename
    6974715