Title :
Dependable reconfigurable computing design diversity and self repair
Author :
Mitra, Subhasish ; McCluskey, Edward J.
Author_Institution :
Intel Corp., Stanford Univ., USA
Abstract :
Summary form only given, as follows. We demonstrate the power of reconfigurable computing in enabling cost-effective implementations of dependable systems. New concurrent error detection techniques based on practical implementations of design diversity are presented Field reconfigurability of reconfigurable hardware is utilized to design self-healing systems capable of autonomous recovery and repair from temporary errors and permanent faults. The applicability of these techniques are demonstrated through implementations on commercial reconfigurable hardware platforms. An error detection scheme based on diverse duplication compares the outputs of two "different" implementations of the same function and indicates error when a mismatch occurs. The idea of such a technique is derived from the general concept of design diversity. The conventional notion of design diversity is qualitative and relies on "independent" generation of "different" implementations. A metric to quantify design diversity is presented along with synthesis algorithms to efficiently design systems with error detection based on diverse duplication. In traditional dependable systems using hardware redundancy, fault tolerance is realized by detecting errors and locating the faulty chip or faulty board (Field Replaceable Unit or FRU) to be replaced by field service engineers. For systems designed using reconfigurable hardware, the FRU is very fined-grained such as a logic block or a routing resources (e.g., a pass-transistor based switch or a logic lookup table in Field Programmable Gate Arrays). Thus, in the case of a permanent fault, a cost-effective repair scheme is obtained using an alternative configuration in which the faulty parts are replaced with originally unused resources. A new self-repairing reconfigurable computing architecture based on dual FPGAs with embedded "soft" micro-controllers is presented This architecture allows the implemented system to recover from temporary errors and repair itself from permanent faults with minimum impact on system performance while ensuring very high data integrity and availability without external intervention. These capabilities make this architecture useful for a variety of dependable applications including unmanned remote applications such as deep- space exploration.
Keywords :
error detection; fault tolerant computing; field programmable gate arrays; reconfigurable architectures; table lookup; autonomous recovery; concurrent error detection techniques; dependable reconfigurable computing design diversity; fault tolerance; field programmable gate arrays; field reconfigurability; logic lookup table; self-healing systems; Algorithm design and analysis; Computer architecture; Fault detection; Field programmable gate arrays; Hardware; Logic design; Programmable logic arrays; Reconfigurable logic; Redundancy; Switches;
Conference_Titel :
Evolvable Hardware, 2002. Proceedings. NASA/DoD Conference on
Print_ISBN :
0-7695-1718-8
DOI :
10.1109/EH.2002.1029857