Title :
Poster: Evaluating Error Resiliency of GPGPU Applications
Author :
Fang, Bo ; Wei, Jiesheng ; Pattabiraman, Karthik ; Ripeanu, Matei
Abstract :
GPUs have been originally designed for error-resilient workload. Today, GPUs are used in error-sensitive applications, e.g. General Purpose GPU (GPGPU) applications. The goal of this project is to investigate the error resilience of GPGPU applications and understand their reliability characteristics. To this end, we employ fault injection on real GPU hardware. We find that, compared to CPUs, GPU platforms lead to a higher rate of silent data corruption -- a major concern since these errors are not flagged at runtime and often remain latent. We also find that out-of-bound memory accesses are the most critical reason of crashes on GPGPU applications
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:
Conference_Location :
Salt Lake City, UT
Print_ISBN :
978-1-4673-6218-4
DOI :
10.1109/SC.Companion.2012.289