Title :
Probability of correctness of processor-array outputs using periodic concurrent error detection
Author :
Chen, Paul P. ; Mourad, Antoine N. ; Fuchs, W. Kent
Author_Institution :
Geoworks, Alameda, CA, USA
fDate :
6/1/1996 12:00:00 AM
Abstract :
Processor arrays, featuring modularity, regular interconnection, and high parallelism, are well suited for VLSI/WSI implementation and specific applications with high computational requirements. Error detection and recovery are important for some applications of processor arrays. Concurrent error detection (CED) techniques, which check normal system operations, are designed to detect errors caused by transient and intermittent faults, However, CED techniques typically suffer from costly hardware penalties or performance costs. This paper describes the periodic application of concurrent error detection (PACED) technique which allows the performance costs incurred through the use of time-redundant CED in processor array architectures to be reduced. The application of CED is varied in both time and space to provide probabilistic detection of errors in processor arrays. The probability of correctness of outputs from processor arrays is studied. Formulae are derived that predict, upon error detection, the amount of possibly erroneous output, for single processors, linear arrays and 2-dimensional mesh processor arrays. The results indicate that the error coverage can be surprisingly high when PACED is applied in processor arrays, e.g., 95% for checking performed 50% of the time
Keywords :
error detection; fault tolerant computing; multiprocessing systems; parallel processing; probability; program processors; system recovery; 2-dimensional mesh processor arrays; VLSI; WSI; computational requirements; correctness probability; error recovery; linear arrays; modularity; parallelism; performance costs; periodic concurrent error detection; processor-array outputs; regular interconnection; single processors; Computer applications; Costs; Degradation; Error analysis; Error correction; Fault detection; Fault tolerance; Hardware; Parallel processing; Very large scale integration;
Journal_Title :
Reliability, IEEE Transactions on