• DocumentCode
    1921954
  • Title

    Blue Gene/L Log Analysis and Time to Interrupt Estimation

  • Author

    Taerat, Narate ; Naksinehaboon, Nichamon ; Chandler, Clayton ; Elliott, James ; Leangsuksun, Chokchai Box ; Ostrouchov, George ; Scott, Stephen L. ; Engelmann, Christian

  • Author_Institution
    Coll. of Eng. & Sci., Louisiana Tech Univ., Ruston, LA
  • fYear
    2009
  • fDate
    16-19 March 2009
  • Firstpage
    173
  • Lastpage
    180
  • Abstract
    System- and application-level failures could be characterized by analyzing relevant log files. The resulting data might then be used in numerous studies on and future developments for the mission-critical and large scale computational architecture, including fields such as failure prediction, reliability modeling, performance modeling and power awareness. In this paper, system logs covering a six month period of the Blue Gene/L supercomputer were obtained and subsequently analyzed. Temporal filtering was applied to remove duplicated log messages. Optimistic and pessimistic perspectives were exerted on filtered log information to observe failure behavior within the system. Further, various time to repair factors were applied to obtain application time to interrupt, which will be exploited in further resilience modeling research.
  • Keywords
    parallel machines; system recovery; systems analysis; Blue Gene/L log analysis; Blue Gene/L supercomputer; application-level failure; duplicate log message; high performance computing; log file analysis; system-level failure; temporal filtering; time-to-interrupt estimation; Computer architecture; Failure analysis; Information filtering; Information filters; Large-scale systems; Mission critical systems; Power system modeling; Power system reliability; Predictive models; Supercomputers;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Availability, Reliability and Security, 2009. ARES '09. International Conference on
  • Conference_Location
    Fukuoka
  • Print_ISBN
    978-1-4244-3572-2
  • Electronic_ISBN
    978-0-7695-3564-7
  • Type

    conf

  • DOI
    10.1109/ARES.2009.105
  • Filename
    5066470