DocumentCode :
1087148
Title :
Partitioned encoding schemes for algorithm-based fault tolerance in massively parallel systems
Author :
Rexford, Jennifer ; Jha, Niraj K.
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Michigan Univ., Ann Arbor, MI, USA
Volume :
5
Issue :
6
fYear :
1994
fDate :
6/1/1994 12:00:00 AM
Firstpage :
649
Lastpage :
653
Abstract :
Considers the applicability of algorithm based fault tolerance (ABET) to massively parallel scientific computation. Existing ABET schemes can provide effective fault tolerance at a low cost For computation on matrices of moderate size; however, the methods do not scale well to floating-point operations on large systems. This short note proposes the use of a partitioned linear encoding scheme to provide scalability. Matrix algorithms employing this scheme are presented and compared to current ABET schemes. It is shown that the partitioned scheme provides scalable linear codes with improved numerical properties with only a small increase in hardware and time overhead
Keywords :
error correction codes; error detection codes; fault tolerant computing; matrix algebra; parallel architectures; software reliability; ABET; algorithm based fault tolerance; checksum code; error correction; error detection; massively parallel systems; matrix algorithms; partitioned encoding; partitioned scheme; scalability; transient errors; Encoding; Error correction codes; Fault detection; Fault tolerance; Fault tolerant systems; Hardware; Linear code; Matrix decomposition; Partitioning algorithms; Signal processing algorithms;
fLanguage :
English
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9219
Type :
jour
DOI :
10.1109/71.285610
Filename :
285610
Link To Document :
بازگشت