DocumentCode :
1524398
Title :
Node-covering, error-correcting codes and multiprocessors with very high average fault tolerance
Author :
Dutt, Shantanu ; Mahapatra, Nihar R.
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Illinois Univ., Chicago, IL, USA
Volume :
46
Issue :
9
fYear :
1997
fDate :
9/1/1997 12:00:00 AM
Firstpage :
997
Lastpage :
1015
Abstract :
Structural fault tolerance (SFT) is the ability of a multiprocessor to reconfigure around faulty processors or links in order to preserve its original processor interconnection structure; In this paper, we focus on the design of SFT multiprocessors that have low switch and link overheads, but can tolerate a very large number of processor faults on the average. Most previous work has concentrated on deterministic k-fault-tolerant (k-FT) designs in which exactly k spare processors and some spare switches and links are added to construct multiprocessors that can tolerate any k processor faults. However, after k faults are reconfigured around, much of the extra links and switches can remain unutilized. It is possible within the basic node-covering framework, which was introduced by Dutt and Hayes as an efficient k-FT design method, to design FT multiprocessors that have the same amount of switches and links as, say, a two-FT deterministic design, but have s spare processors, where s≫2, so that, on the average, k=Θ(s) (k⩽s) processor failures can be reconfigured around. Such designs utilize the spare link and switch capacity very efficiently, and are called probabilistic FT designs. An elegant and powerful method to construct covering graphs or CG´s, which are key to obtaining the probabilistic FT designs, is to use linear error-correcting codes (ECCs). We show how to construct probabilistic designs with very high average fault tolerance but low wiring and switch overhead using ECCs like the 2D-parity, full-two, 3D-parity, and full-three codes
Keywords :
error correction codes; fault tolerant computing; multiprocessing systems; multiprocessor interconnection networks; Average fault tolerance; FT multiprocessors; SFT multiprocessors; VLSI layout; deterministic fault tolerance; fault tolerance; fault-tolerant multiprocessors; faulty processors; linear error-correcting codes; matching; multiprocessors; network flow; node-covering; processor interconnection; reconfiguration; reconfigure; structural fault tolerance; Capacity planning; Design methodology; Error correction codes; Fault tolerance; Multiprocessor interconnection; Signal processing; Switches; Topology; Very large scale integration; Wiring;
fLanguage :
English
Journal_Title :
Computers, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9340
Type :
jour
DOI :
10.1109/12.620481
Filename :
620481
Link To Document :
بازگشت