• DocumentCode
    3706516
  • Title

    Bit Flipping Errors in High Performance Linpack at Exascale and Beyond

  • Author

    Erlin Yao;Guangming Tan

  • Author_Institution
    Inst. of Comput. Technol., Beijing, China
  • fYear
    2015
  • Firstpage
    420
  • Lastpage
    429
  • Abstract
    For the High Performance Linpack (HPL) benchmark at the coming Exascale and beyond, silent errors like bit flipping in memory are expected to become inevitable. However, since bit flipping errors are difficult to be detected and located, their impact to the numerical correctness of HPL has not been evaluated thoroughly and quantitatively, while the impact at Exascale is especially susceptible. In this paper, an initial quantitative analysis of the impact of bit flipping errors to the numerical correctness of HPL has been investigated. To validate the numerical correctness of computed solution using HPL, there is a residual check after the approximate solution obtained. This paper has shown that in the case of only one bit flipping to any element in the original data matrix, if the flipped position is not the leading position of exponent, the residual check in HPL will almost surely pass at the scale of Exa flops and beyond. Experiments on modified HPL in single precision at small scales have verified the theoretical results in double precision at Exascale. The results obtained in this paper can provide a better understanding to the impact of bit flipping errors to numerical correctness of scientific computing applications.
  • Keywords
    "Linear systems","Hardware","Software","Fault tolerance","Supercomputers","Error correction codes"
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing (ICPP), 2015 44th International Conference on
  • ISSN
    0190-3918
  • Type

    conf

  • DOI
    10.1109/ICPP.2015.51
  • Filename
    7349597