• DocumentCode
    2923591
  • Title

    BLOCKWATCH: Leveraging similarity in parallel programs for error detection

  • Author

    Wei, Jiesheng ; Pattabiraman, Karthik

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of British Columbia (UBC), Vancouver, BC, Canada
  • fYear
    2012
  • fDate
    25-28 June 2012
  • Firstpage
    1
  • Lastpage
    12
  • Abstract
    The scaling of Silicon devices has exacerbated the unreliability of modern computer systems, and power constraints have necessitated the involvement of software in hardware error detection. Simultaneously, the multi-core revolution has impelled software to become parallel. Therefore, there is a compelling need to protect parallel programs from hardware errors. Parallel programs´ tasks have significant similarity in control data due to the use of high-level programming models. In this study, we propose BLOCKWATCH to leverage the similarity in parallel program´s control data for detecting hardware errors. BLOCKWATCH statically extracts the similarity among different threads of a parallel program and checks the similarity at runtime. We evaluate BLOCKWATCH on seven SPLASH-2 benchmarks to measure its performance overhead and error detection coverage. We find that BLOCKWATCH incurs an average overhead of 16% across all programs, and provides an average SDC coverage of 97% for faults in the control data.
  • Keywords
    error detection; multiprocessing systems; parallel programming; program diagnostics; software performance evaluation; software reliability; BLOCKWATCH; SPLASH-2 benchmarks; Silicon devices; computer systems; hardware error detection; high-level programming models; multicore revolution; parallel program protection; similarity leverage; static analysis; Computers; Hardware; Instruction sets; Multicore processing; Runtime; Control-data; Parallel programs; Runtime checks; SPMD; Static Analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Dependable Systems and Networks (DSN), 2012 42nd Annual IEEE/IFIP International Conference on
  • Conference_Location
    Boston, MA
  • ISSN
    1530-0889
  • Print_ISBN
    978-1-4673-1624-8
  • Electronic_ISBN
    1530-0889
  • Type

    conf

  • DOI
    10.1109/DSN.2012.6263959
  • Filename
    6263959