• DocumentCode
    2788031
  • Title

    ARIADNE: Agnostic Reconfiguration in a Disconnected Network Environment

  • Author

    Aisopos, Konstantinos ; DeOrio, Andrew ; Peh, Li-Shiuan ; Bertacco, Valeria

  • Author_Institution
    Princeton Univ., Princeton, NJ, USA
  • fYear
    2011
  • fDate
    10-14 Oct. 2011
  • Firstpage
    298
  • Lastpage
    309
  • Abstract
    Extreme transistor technology scaling is causing increasing concerns in device reliability: the expected lifetime of individual transistors in complex chips is quickly decreasing, and the problem is expected to worsen at future technology nodes. With complex designs increasingly relying on Networks-on-Chip (NoCs) for on-chip data transfers, a NoC must continue to operate even in the face of many transistor failures. Specifically, it must be able to reconfigure and reroute packets around faults to enable continued operation, i.e., generate new routing paths to replace the old ones upon a failure. In addition to these reliability requirements, NoCs must maintain low latency and high throughput at very low area budget. In this work, we propose a distributed reconfiguration solution named Ariadne, targeting large, aggressively scaled, unreliable NoCs. Ariadne utilizes up*/down* for fast routing at high bandwidth, and upon any number of concurrent network failures in any location, it reconfigures to discover new resilient paths to connect the surviving nodes. Experimental results show that Ariadne provides a 40%-140% latency improvement (when subject to 50 faults in a 64-node NoC) over other on-chip state-of-the-art fault tolerant solutions, while meeting the low area budget of on-chip routers with an overhead of just 1.97%.
  • Keywords
    fault tolerant computing; integrated circuit reliability; network routing; network-on-chip; ARIADNE; Ariadne; agnostic reconfiguration; complex chips; complex designs; concurrent network failures; device reliability; disconnected network environment; distributed reconfiguration solution; extreme transistor technology scaling; latency improvement; low area budget; networks-on-chip; on-chip data transfers; on-chip routers; on-chip state-of-the-art fault tolerant solutions; reliability requirements; routing paths; surviving nodes; transistor failures; unreliable NoC; Network topology; Reliability; Routing; System recovery; System-on-a-chip; Topology; Transistors; NOC; distributed; reconfiguration; resilience;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on
  • Conference_Location
    Galveston, TX
  • ISSN
    1089-795X
  • Print_ISBN
    978-1-4577-1794-9
  • Type

    conf

  • DOI
    10.1109/PACT.2011.61
  • Filename
    6113838