مرکز منطقه ای اطلاع رساني علوم و فناوري - Improving GPU Robustness by making use of faulty parts

DocumentCode :

2344248

Title :

Improving GPU Robustness by making use of faulty parts

Author :

Durytskyy, Artem ; Zahran, Mohamed ; Karri, Ramesh

Author_Institution :

ECE Dept., NYU, New York, NY, USA

fYear :

2011

fDate :

9-12 Oct. 2011

Firstpage :

346

Lastpage :

351

Abstract :

With hundreds of processing units in current state-of-the-art graphics processing units (GPUs), the probability that one or more processing units fail due to permanent faults, during fabrication or post deployment, increases drastically. In our experiments we found that the loss of a single streaming multiprocessor (SM) in an 8-SM GPU resulted in as much as 16%performance loss. The default method for dealing with faulty SMs is to turn them off. Although faulty SMs cannot be trusted to completely execute a single kernel (program assigned to an SM) correctly, we show that we can still make use of these SMs to improve system throughput by generating and supplying high-level hints to other functional SMs. By making the faulty SMs supply hints to functional SMs, we have been able to achieve an average speed-up of about 16% over the baseline case (wherein the faulty SMs are turned off). The proposed technique requires minimal hardware overhead and is highly scalable.

Keywords :

computer graphic equipment; coprocessors; fault tolerant computing; microprocessor chips; 8-SM GPU; GPU robustness; faulty parts; graphics processing units; minimal hardware overhead; single streaming multiprocessor; system throughput; Accuracy; Benchmark testing; Graphics processing unit; Hardware; Instruction sets; Kernel; Radiation detectors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer Design (ICCD), 2011 IEEE 29th International Conference on

Conference_Location :

Amherst, MA

ISSN :

1063-6404

Print_ISBN :

978-1-4577-1953-0

Type :

conf

DOI :

10.1109/ICCD.2011.6081422

Filename :

6081422

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2344248