DocumentCode :
3678435
Title :
New Systems, New Behaviors, New Patterns: Monitoring Insights from System Standup
Author :
Jim Brandt;Ann Gentile;Cindy Martin;Jason Repik;Narate Taerat
Author_Institution :
Sandia Nat. Labs., Albuquerque, NM, USA
fYear :
2015
Firstpage :
658
Lastpage :
665
Abstract :
Disentangling significant and important log messages from those that are routine and unimportant can be a difficult task. Further, on a new system, understanding correlations between significant and possibly new types of messages and conditions that cause them can require significant effort and time. The initial standup of a machine can provide opportunities for investigating the parameter space of events and operations and thus for gaining insight into the events of interest. In particular, failure inducement and investigation of corner case conditions can provide knowledge of system behavior for significant issues that will enable easier diagnosis and mitigation of such issues for when they may actually occur during the platform lifetime. In this work, we describe the testing process and monitoring results from a testbed system in preparation for the ACES Trinity system. We describe how events in the initial standup including changes in configuration and software and corner case testing has provided insights that can inform future monitoring and operating conditions, both of our test systems and the eventual large-scale Trinity system.
Keywords :
"Blades","Testing","Monitoring","Program processors","Cooling","Layout","Temperature"
Publisher :
ieee
Conference_Titel :
Cluster Computing (CLUSTER), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/CLUSTER.2015.116
Filename :
7307665
Link To Document :
بازگشت