• DocumentCode
    652279
  • Title

    Saving Time in a Program Robustness Evaluation

  • Author

    Gramacho, Joao ; Rexachs, Dolores ; Luque, Emilio

  • Author_Institution
    Comput. Archit. & Oper. Syst. Dept., Univ. Autonoma de Barcelona, Barcelona, Spain
  • fYear
    2013
  • fDate
    16-18 July 2013
  • Firstpage
    1274
  • Lastpage
    1282
  • Abstract
    The risk of having a program execution corrupted by transient faults is growing as computer processors are using more transistors, are becoming denser and are operating at lower voltages. This risk is multiplied when we take into account High Performance Computing with its hundreds or thousands of processors working together to solve a single problem. To evaluate how program executions behave in presence of transient faults we have proposed the concept of robustness against transient faults. This concept can be used to determine the more significant parts of a program with respect to the risk of misbehavior by transient faults for further study of improvement. The robustness concept can also be used as a metric to compare different approaches applied to a program to make it less likely of producing corrupted results. In this work we present why and how is possible to simplify a fraction of a program´s robustness by taking into account the repetition of sequences of instructions. The simplified analysis obtains the exact same result as a full program robustness evaluation (exhaustively and without estimations). By simplifying the analysis we were able to reduce in up to 192 times our previously published robustness analysis time and also were able to evaluate larger programs in feasible time (unimaginable by using executions in a fault injection capable environment).
  • Keywords
    microprocessor chips; computer processors; high performance computing; instruction sequences; program execution; program robustness evaluation; Absorption; Compression algorithms; Computer architecture; Program processors; Registers; Robustness; Transient analysis; Transient faults; reliability; robustness; simplification; soft errors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on
  • Conference_Location
    Melbourne, VIC
  • Type

    conf

  • DOI
    10.1109/TrustCom.2013.237
  • Filename
    6680974