DocumentCode :
159484
Title :
GPGPUs ECC efficiency and efficacy
Author :
Oliveira, Daniel A. G. ; Rech, P. ; Pilla, Laercio L. ; Navaux, Philippe Olivier Alexandre ; Carro, Luigi
Author_Institution :
Inst. of Inf., Fed. Univ. of Rio Grande do Sul, Porto Alegre, Brazil
fYear :
2014
fDate :
1-3 Oct. 2014
Firstpage :
209
Lastpage :
215
Abstract :
In this paper we assess and discuss the efficiency and overhead of the Error-Correcting Code (ECC) mechanism available on modern GPGPUs, which are increasingly used for both High Performance Computing and safety-critical applications. Both the resilience to radiation-induced silent data corruption and functional interruption are experimentally and analytically addressed. The provided experimental analysis demonstrates that the ECC significantly reduces the occurrence of silent data corruption but may not be sufficient to guarantee high reliability. Moreover, the ECC increases the GPGPU functional interruption rate. Finally, the ECC performances and reliability are compared to Algorithm-Based Fault Tolerance and Duplication With Comparison strategies.
Keywords :
electronic engineering computing; error correction codes; fault tolerant computing; ECC efficiency; GPGPU; algorithm-based fault tolerance; error-correcting code mechanism; functional interruption rate; high performance computing; radiation-induced silent data corruption; safety-critical application; Benchmark testing; Error correction codes; Graphics processing units; Instruction sets; Interrupters; Neutrons; Reliability; ABFT; ECC; GPGPU; duplication with comparison; functional interruption; silent data corruption;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), 2014 IEEE International Symposium on
Conference_Location :
Amsterdam
Print_ISBN :
978-1-4799-6154-2
Type :
conf
DOI :
10.1109/DFT.2014.6962085
Filename :
6962085
Link To Document :
بازگشت