DocumentCode :
3056126
Title :
What Supercomputers Say: A Study of Five System Logs
Author :
Oliner, Adam ; Stearley, Jon
Author_Institution :
Stanford Univ., Stanford
fYear :
2007
fDate :
25-28 June 2007
Firstpage :
575
Lastpage :
584
Abstract :
If we hope to automatically detect and diagnose failures in large-scale computer systems, we must study real deployed systems and the data they generate. Progress has been hampered by the inaccessibility of empirical data. This paper addresses that dearth by examining system logs from five supercomputers, with the aim of providing useful insight and direction for future research into the use of such logs. We present details about the systems, methods of log collection, and how alerts were identified; propose a simpler and more effective filtering algorithm; and define operational context to encompass the crucial information that we found to be currently missing from most logs. The machines we consider (and the number of processors) are: Blue Gene/L (131072), Red Storm (10880), Thunderbird (9024), Spirit (1028), and Liberty (512). This is the first study of raw system logs from multiple supercomputers.
Keywords :
parallel machines; performance evaluation; filtering algorithm; large-scale computer systems; supercomputers; system logs; Chaotic communication; Computer science; Filtering algorithms; Laboratories; Large-scale systems; Power system reliability; Pressing; Storms; Supercomputers; Tagging;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Dependable Systems and Networks, 2007. DSN '07. 37th Annual IEEE/IFIP International Conference on
Conference_Location :
Edinburgh
Print_ISBN :
0-7695-2855-4
Type :
conf
DOI :
10.1109/DSN.2007.103
Filename :
4273008
Link To Document :
بازگشت