مرکز منطقه ای اطلاع رساني علوم و فناوري - Vicis: A reliable network for unreliable silicon

DocumentCode :

500798

Title :

Vicis: A reliable network for unreliable silicon

Author :

Fick, David ; DeOrio, Andrew ; Hu, Jin ; Bertacco, Valeria ; Blaauw, David ; Sylvester, Dennis

Author_Institution :

Dept. of Electr. Eng. & Comput. Sci., Univ. of Michigan, Ann Arbor, MI, USA

fYear :

2009

fDate :

26-31 July 2009

Firstpage :

812

Lastpage :

817

Abstract :

Process scaling has given designers billions of transistors to work with. As feature sizes near the atomic scale, extensive variation and wearout inevitably make margining uneconomical or impossible. The ElastIC project seeks to address this by creating a large-scale chip-multiprocessor that can self-diagnose, adapt, and heal. Creating large, flexible designs in this environment naturally lends itself to the repetitive nature of network-on-chip (NoC), but the loss of a single link or router will result in complete network failure. In this work we present Vicis, an ElastIC-style NoC that can tolerate the loss of many network components due to wearout induced hard faults. Vicis uses the inherent redundancy in the network and its routers in order to maintain correct operation while incurring a much lower area overhead than previously proposed N-modular redundancy (NMR) based solutions. Each router has a built-in-self-test (BIST) that diagnoses the locations of hard fault and runs a number of algorithms to best use ECC, port swapping, and a crossbar bypass bus to mitigate them. The routers work together to run distributed algorithms to solve network-wide problems as well, protecting the networking against critical failures in individual routers. In this work we show that with stuck-at fault rates as high as 1 in 2000 gates, Vicis will continue to operate with approximately half of its routers still functional and communicating.

Keywords :

built-in self test; distributed algorithms; failure analysis; microprocessor chips; network-on-chip; redundancy; ECC; ElastIC project; N-modular redundancy; Vicis; built-in-self-test; crossbar bypass bus; distributed algorithms; hard fault; large-scale chip-multiprocessor; network failure; network redundancy; network-on-chip; port swapping; process scaling; routers; stuck-at fault rates; Electric breakdown; Fault diagnosis; Fault tolerance; Network-on-a-chip; Power system management; Redundancy; Silicon; System recovery; Telecommunication traffic; Testing; Built-in-Self-Test; Fault Tolerance; Hard Faults; N-Modular Redundancy; Network-on-Chip; Reconfiguration; Torus;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Design Automation Conference, 2009. DAC '09. 46th ACM/IEEE

Conference_Location :

San Francisco, CA

ISSN :

0738-100X

Print_ISBN :

978-1-6055-8497-3

Type :

conf

Filename :

5227053

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=500798