Abstract :
This embedded tutorial presents an overview of the issues related to the test, diagnosis and fault-tolerance of Networks-on-chip (NoCs). The main motivation is the increasing interest on NoC-based designs in academia and industry. This new design option is becoming an important trend because of its advantages on tackling the challenges of a complex SoC design. However, to become a real standard or an industrial reality, it is important that the issues related to its testing are also well understood and dominated. Initially, the challenges to test and diagnose faults in NoCs are presented. A NoC is basically made up of three main components: network interface, routers and communication channels. To start with the network interface and routers, many works presented the problem of NoC testing suggesting that a wide variety of standard DfT solutions can be used. However, a deeper evaluation of the approaches brought from board-level and chip-level testing, shows that much better results, in terms of silicon overhead, test time and diagnosability, can be obtained if test approaches specific for NoCs are used. Testing and diagnosis of NoC channels is also important. The huge number of interconnects allied to the shrinking of the chip dimensions make the NoC prone to a growing number of interconnect faults. The capability of detecting interconnect faults in NoC-based SoCs is mandatory for yield improvement. Moreover, fault diagnosis of NoC interconnect links can help fault tolerance approaches to mitigate the faults and to maintain the network service. The most influent NoC test and diagnosis strategies are presented and comparatively discussed in this tutorial. In fact, the correctness of the on-chip network behavior is a feature that must be ensured, not only in test mode, but also during the mission of the system. In mission mode, all system communication is implemented through the NoC and designers assume a certain Quality of Service (QoS) level according to specific d- - esign techniques. However, current technologies are becoming quite sensitive to noise and cosmic particles that can cause transient faults. Thus, it is important to ensure that the NoC service is not compromised by those faults. This way, fault tolerance techniques for the communication infrastructure must be considered when designing a NoC-based system. Various techniques to detect and recover from transient and permanent faults are discussed in this embedded tutorial. For the last years a number of testing and fault tolerance approaches have been proposed for NoCs. Although this is a relatively new topic, current research covers already a considerable spectrum and its analysis at this point can help to summarize the scientific and technological advances made so far and to identify the open issues that still need to be tackled.