• DocumentCode
    2569947
  • Title

    Compiler-assisted generation of error-detecting parallel programs

  • Author

    Roy-Chowdhury, A. ; Banerjee, P.

  • Author_Institution
    IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
  • fYear
    1996
  • fDate
    25-27 Jun 1996
  • Firstpage
    360
  • Lastpage
    369
  • Abstract
    We have developed an automated a compile time approach to generating error-detecting parallel programs. The compiler is used to identify statements implementing affine transformations within the program and to automatically insert code for computing, manipulating, and comparing checksums in order to detect data errors at runtime. Statements which do not implement affine transformations are checked by duplication. Checksums are reused from one loop to the next if this is possible, rather than recomputing checksums for every statement. A global dataflow analysis is performed in order to determine points at which checksums need to be recomputed. We also use a novel method of specifying the data distributions of the check data using data distribution directives so that the computations on the original data, and the corresponding check computations are performed on different processors. Results on the time overhead and error coverage of the error detecting parallel programs over the original programs are presented on an Intel Paragon distributed memory multicomputer
  • Keywords
    automatic programming; data flow analysis; distributed memory systems; parallel programming; parallelising compilers; program debugging; program diagnostics; software fault tolerance; Intel Paragon distributed memory multicomputer; affine transformations; check computations; checksums; compile time approach; compiler-assisted program generation; data distributions; data error detection; duplication; error-detecting parallel programs; global dataflow analysis; parallelising compilers; runtime; Computer errors; Concurrent computing; Contracts; Data analysis; Distributed computing; Encoding; Fault tolerance; Performance analysis; Program processors; Runtime;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fault Tolerant Computing, 1996., Proceedings of Annual Symposium on
  • Conference_Location
    Sendai
  • ISSN
    0731-3071
  • Print_ISBN
    0-8186-7262-5
  • Type

    conf

  • DOI
    10.1109/FTCS.1996.534621
  • Filename
    534621