DocumentCode
1921954
Title
Blue Gene/L Log Analysis and Time to Interrupt Estimation
Author
Taerat, Narate ; Naksinehaboon, Nichamon ; Chandler, Clayton ; Elliott, James ; Leangsuksun, Chokchai Box ; Ostrouchov, George ; Scott, Stephen L. ; Engelmann, Christian
Author_Institution
Coll. of Eng. & Sci., Louisiana Tech Univ., Ruston, LA
fYear
2009
fDate
16-19 March 2009
Firstpage
173
Lastpage
180
Abstract
System- and application-level failures could be characterized by analyzing relevant log files. The resulting data might then be used in numerous studies on and future developments for the mission-critical and large scale computational architecture, including fields such as failure prediction, reliability modeling, performance modeling and power awareness. In this paper, system logs covering a six month period of the Blue Gene/L supercomputer were obtained and subsequently analyzed. Temporal filtering was applied to remove duplicated log messages. Optimistic and pessimistic perspectives were exerted on filtered log information to observe failure behavior within the system. Further, various time to repair factors were applied to obtain application time to interrupt, which will be exploited in further resilience modeling research.
Keywords
parallel machines; system recovery; systems analysis; Blue Gene/L log analysis; Blue Gene/L supercomputer; application-level failure; duplicate log message; high performance computing; log file analysis; system-level failure; temporal filtering; time-to-interrupt estimation; Computer architecture; Failure analysis; Information filtering; Information filters; Large-scale systems; Mission critical systems; Power system modeling; Power system reliability; Predictive models; Supercomputers;
fLanguage
English
Publisher
ieee
Conference_Titel
Availability, Reliability and Security, 2009. ARES '09. International Conference on
Conference_Location
Fukuoka
Print_ISBN
978-1-4244-3572-2
Electronic_ISBN
978-0-7695-3564-7
Type
conf
DOI
10.1109/ARES.2009.105
Filename
5066470
Link To Document