• DocumentCode
    1919684
  • Title

    Abstract: Evaluating Error Resiliency of GPGPU Applications

  • Author

    Bo Fang ; Jiesheng Wei ; Pattabiraman, Karthik ; Ripeanu, Matei

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of British Columbia, Vancouver, BC, Canada
  • fYear
    2012
  • fDate
    10-16 Nov. 2012
  • Firstpage
    1502
  • Lastpage
    1503
  • Abstract
    We present a preliminary evaluation of error-resilience of GPGPU applications. We find that, compared to CPUs, these platforms lead to a higher rate of silent data corruption a major concern since these errors are not flagged at runtime and often remain latent. We also find that out-of-bound memory accesses are the most critical reason of crashes. In the future, we will first focus on techniques to reduce frequency of silent data corruption, as this is critical to most HPC applications.
  • Keywords
    graphics processing units; parallel processing; GPGPU; HPC application; error resiliency evaluation; out-of-bound memory access; silent data corruption;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:
  • Conference_Location
    Salt Lake City, UT
  • Print_ISBN
    978-1-4673-6218-4
  • Type

    conf

  • DOI
    10.1109/SC.Companion.2012.288
  • Filename
    6496071