DocumentCode :
3117536
Title :
Highly-reliable integer matrix multiplication via numerical packing
Author :
Anarado, Ijeoma ; Anam, Mohammad Ashraful ; Anastasia, Davide ; Verdicchio, Fabio ; Andreopoulos, Yiannis
Author_Institution :
Electr. Eng. Dept., Univ. Coll. London, London, UK
fYear :
2013
fDate :
8-10 July 2013
Firstpage :
19
Lastpage :
24
Abstract :
The generic matrix multiply (GEMM) routine comprises the compute and memory-intensive part of many information retrieval, relevance ranking and object recognition systems. Because of the prevalence of GEMM in these applications, ensuring its robustness to transient hardware faults is of paramount importance for highly-efficientlhighly-reliable systems. This is currently accomplished via error control coding (ECC) or via dual modular redundancy (DMR) approaches that produce a separate set of “parity” results to allow for fault detection in GEMM. We introduce a third family of methods for fault detection in integer matrix products based on the concept of numerical packing. The key difference of the new approach against ECC and DMR approaches is the production of redundant results within the numerical representation of the inputs rather than as a separate set of parity results. In this way, high reliability is ensured within integer matrix products while allowing for: (i) in-place storage; (ii) usage of any off-the-shelf 64-bit floating-point GEMM routine; (iii) computational overhead that is independent of the GEMM inner dimension. The only detriment against a conventional (i.e. fault-intolerant) integer matrix multiplication based on 32-bit floating-point GEMM is the sacrifice of approximately 30.6% of the bitwidth of the numerical representation. However, unlike ECC methods that can reliably detect only up to a few faults per GEMM computation (typically two), the proposed method attains more than “12 nines” reliability, i.e. it will only fail to detect 1 fault out of more than 1 trillion arbitrary faults in the GEMM operations. As such, it achieves reliability that approaches that of DMR, at a very small fraction of its cost. Specifically, a single-threaded software realization of our proposal on an Intel i7-3632QM 2.2GHz processor (Ivy Bridge architecture with AVX support) incurs, on average, only 19% increase of execution time agai- st an optimized, fault-intolerant, 32-bit GEMM routine over a range of matrix sizes and it remains more than 80% more efficient than a DMR-based GEMM.
Keywords :
fault diagnosis; floating point arithmetic; matrix multiplication; microprocessor chips; multiprocessing systems; DMR approaches; ECC approaches; GEMM; Intel i7-3632QM 2.2GHz processor; compute-intensive part; dual modular redundancy approaches; error control coding; fault detection; generic matrix multiply routine; highly-reliable integer matrix multiplication; information retrieval systems; memory-intensive part; numerical packing; object recognition systems; off-the-shelf 64-bit floating-point GEMM routine; relevance ranking systems; single-threaded software realization; transient hardware faults; Circuit faults; Error correction codes; Fault detection; Fault tolerant systems; Redundancy; fault tolerance; integer matrix multiplication; numerical packing; soft errors; sum-of-products;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
On-Line Testing Symposium (IOLTS), 2013 IEEE 19th International
Conference_Location :
Chania
Type :
conf
DOI :
10.1109/IOLTS.2013.6604045
Filename :
6604045
Link To Document :
بازگشت