DocumentCode :
2050612
Title :
Filtering System Metrics for Minimal Correlation-Based Self-Monitoring
Author :
Munawar, Mohammad A. ; Jiang, Miao ; Reidemeister, Thomas ; Ward, Paul A S
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Waterloo, Waterloo, ON, Canada
fYear :
2009
fDate :
14-18 Sept. 2009
Firstpage :
233
Lastpage :
242
Abstract :
Self-adaptive and self-organizing systems must be self-monitoring. Recent research has shown that self-monitoring can be enabled by using correlations between monitoring variables (metrics). However, computer systems often make a very large number of metrics available for collection. Collecting them all not only reduces system performance, but also creates other overheads related to communication, storage, and processing. In order to control the overhead, it is necessary to limit collection to a subset of the available metrics. Manual selection of metrics requires a good understanding of system internals, which can be difficult given the size and complexity of modern computer systems. In this paper, assuming no knowledge of metric semantics or importance and no advance availability of fault data, we investigate automated methods for selecting a subset of available metrics in the context of correlation-based monitoring. Our goal is to collect fewer metrics while maintaining the ability to detect errors. We propose several metric selection methods that require no information beside correlations. We compare these methods on the basis of fault coverage. We show that our minimum spanning tree-based selection performs best, detecting on average 66% of faults detectable by full monitoring (i.e., using all considered metrics) with only 30% of the metrics.
Keywords :
filtering theory; monitoring; self-adjusting systems; software metrics; trees (mathematics); computer systems; filtering system metrics; metric semantics; minimal correlation-based self-monitoring; minimum spanning tree; self-adaptive systems; self-organizing systems; Communication system control; Computer errors; Computerized monitoring; Fault detection; Filtering; Humans; Predictive models; Pressing; Software systems; System performance; adaptive monitoring; error detection; metric correlations; self-monitoring; subset selection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Self-Adaptive and Self-Organizing Systems, 2009. SASO '09. Third IEEE International Conference on
Conference_Location :
San Francisco, CA
Print_ISBN :
978-1-4244-4890-6
Electronic_ISBN :
978-0-7695-3794-8
Type :
conf
DOI :
10.1109/SASO.2009.36
Filename :
5298441
Link To Document :
بازگشت