Title :
An efficient method to reduce roundoff error in matrix multiplication with algorithm-based fault tolerance
Author :
Zhang, Qihong ; Kim, Jung H.
Author_Institution :
Center for Adv. Comput. Studies, Southwestern Louisiana Univ., Lafayette, LA, USA
Abstract :
Algorithm-Based Fault Tolerance (ABFT) schemes have been proposed by a number of researchers recently. Although all errors can be theoretically detected and corrected by using these techniques, some practical problems, especially the roundoff errors, degrade the performance drastically. In this paper, we proposed a new scheme called Extended Mantissa Checksum (EMC) test in which the mantissa of the product of two input matrices are divided into two sections and extended for faulty detection and correction. Using this scheme, the number of undetected errors and false alarms are decreased largely and the error coverage is improved significantly. In addition, the time latency is short and the hardware overhead is small compared with other schemes
Keywords :
algorithm theory; digital arithmetic; error correction; error detection; fault tolerant computing; matrix algebra; roundoff errors; algorithm-based fault tolerance; error coverage; extended mantissa checksum test; false alarms; fault correction; fault detection; floating point test; hardware overhead; matrix multiplication; roundoff error; time latency; Arithmetic; Computer errors; Degradation; Delay; Electromagnetic compatibility; Error correction; Fault detection; Fault tolerance; Roundoff errors; Testing;
Conference_Titel :
Wafer Scale Integration, 1994. Proceedings., Sixth Annual IEEE International Conference on
Conference_Location :
San Francisco, CA
Print_ISBN :
0-7803-1850-1
DOI :
10.1109/ICWSI.1994.291235