DocumentCode :
1919684
Title :
Abstract: Evaluating Error Resiliency of GPGPU Applications
Author :
Bo Fang ; Jiesheng Wei ; Pattabiraman, Karthik ; Ripeanu, Matei
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of British Columbia, Vancouver, BC, Canada
fYear :
2012
fDate :
10-16 Nov. 2012
Firstpage :
1502
Lastpage :
1503
Abstract :
We present a preliminary evaluation of error-resilience of GPGPU applications. We find that, compared to CPUs, these platforms lead to a higher rate of silent data corruption a major concern since these errors are not flagged at runtime and often remain latent. We also find that out-of-bound memory accesses are the most critical reason of crashes. In the future, we will first focus on techniques to reduce frequency of silent data corruption, as this is critical to most HPC applications.
Keywords :
graphics processing units; parallel processing; GPGPU; HPC application; error resiliency evaluation; out-of-bound memory access; silent data corruption;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:
Conference_Location :
Salt Lake City, UT
Print_ISBN :
978-1-4673-6218-4
Type :
conf
DOI :
10.1109/SC.Companion.2012.288
Filename :
6496071
Link To Document :
بازگشت