Title :
Fault Localizing End-to-End Flow Control Protocol for Networks-on-Chip
Author :
Schley, Gert ; Batzolis, N. ; Radetzki, Martin
Author_Institution :
Embedded Syst. Eng. Group (ES), Univ. of Stuttgart Stuttgart, Stuttgart, Germany
fDate :
Feb. 27 2013-March 1 2013
Abstract :
A reliable data exchange between cores of a Network-on-Chip (NoC) is of great importance for correct system behavior. However, data exchange is aggravated by the occurrence of transient and permanent faults in the NoC´s communication structure (links). These faults may cause corruption or loss of data which in turn may lead to performance degradation or, in worst case, to complete system failure. In case data is corrupted by a transient fault, a common measure to handle this is to retransmit the data. To ensure that faulty data is retransmitted, so called flow control protocols are applied. In case of permanent faults a simple retransmission is not possible. Permanent faults in e.g. links lead to a permanent corruption of data as long as they are not located. Thus, even retransmissions get corrupted. In this paper we present a fault tolerant end-to-end protocol applicable to arbitrary NoC topologies. It ensures reliable end-to-end communication in presence of transient and permanent faults in the interconnection structure. By means of the protocol´s online diagnostic ability, it is capable of locating faulty links and switches without any additional diagnosis hardware.
Keywords :
electronic data interchange; failure analysis; fault diagnosis; fault tolerant computing; multiprocessor interconnection networks; network topology; network-on-chip; performance evaluation; protocols; NoC communication structure; arbitrary NoC topologies; diagnosis hardware; fault localizing end-to-end flow control protocol; fault tolerant end-to-end protocol; faulty data retransmission; interconnection structure; networks-on-chip; performance degradation; permanent data corruption; permanent faults; protocol online diagnostic ability; reliable data exchange; reliable end-to-end communication; system failure; transient faults; Buffer storage; Protocols; Receivers; Reliability; Routing; Software; Transient analysis; Fault Tolerance; Networks-on-Chip; Protocol;
Conference_Titel :
Parallel, Distributed and Network-Based Processing (PDP), 2013 21st Euromicro International Conference on
Conference_Location :
Belfast
Print_ISBN :
978-1-4673-5321-2
Electronic_ISBN :
1066-6192
DOI :
10.1109/PDP.2013.74