Title :
Evaluating reliability improvements of fault tolerant array processors using algorithm-based fault tolerance
Author :
Tao, D.L. ; Kantawala, Kamal
Author_Institution :
Dept. of Electr. Eng., State Univ. of New York, Stony Brook, NY, USA
fDate :
6/1/1997 12:00:00 AM
Abstract :
Algorithm-based fault tolerance (ABFT) is used to provide low-cost error protection for VLSI processor arrays used in real-time digital signal processing. The main objective of incorporating an ABFT technique in a processor array is to improve its reliability. All previous approaches on ABFT are evaluated in terms of their error detecting/correcting capabilities, the reliability improvement has never been addressed. In this paper, we develop a stochastic model for an array processor incorporating ABFT that takes the behavior of transient/intermittent failures and hardware overhead into account. This model is then used to evaluate reliability and reliability improvements of several existing ABFT techniques that tolerate single faults. Therefore, a user can evaluate a number of ABFT techniques and make a trade-off between reliability and cost prior to the implementation. Moreover, we have conducted extensive simulation experiments and the simulation results validate the proposed model
Keywords :
array signal processing; error correction codes; error detection codes; fault tolerant computing; parallel processing; VLSI processor arrays; algorithm-based fault tolerance; array processor; error correction; error detection; fault tolerant array processors; hardware overhead; low-cost error protection; processor array; real-time digital signal processing; reliability improvements; simulation experiments; stochastic model; Error correction; Fault detection; Fault tolerance; Fault tolerant systems; Hardware; Multiprocessing systems; Protection; Signal processing algorithms; Stochastic processes; Very large scale integration;
Journal_Title :
Computers, IEEE Transactions on