DocumentCode
1919684
Title
Abstract: Evaluating Error Resiliency of GPGPU Applications
Author
Bo Fang ; Jiesheng Wei ; Pattabiraman, Karthik ; Ripeanu, Matei
Author_Institution
Dept. of Electr. & Comput. Eng., Univ. of British Columbia, Vancouver, BC, Canada
fYear
2012
fDate
10-16 Nov. 2012
Firstpage
1502
Lastpage
1503
Abstract
We present a preliminary evaluation of error-resilience of GPGPU applications. We find that, compared to CPUs, these platforms lead to a higher rate of silent data corruption a major concern since these errors are not flagged at runtime and often remain latent. We also find that out-of-bound memory accesses are the most critical reason of crashes. In the future, we will first focus on techniques to reduce frequency of silent data corruption, as this is critical to most HPC applications.
Keywords
graphics processing units; parallel processing; GPGPU; HPC application; error resiliency evaluation; out-of-bound memory access; silent data corruption;
fLanguage
English
Publisher
ieee
Conference_Titel
High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:
Conference_Location
Salt Lake City, UT
Print_ISBN
978-1-4673-6218-4
Type
conf
DOI
10.1109/SC.Companion.2012.288
Filename
6496071
Link To Document