Addressing software dependability with statistical and machine learning techniques

Author

Fox, Armando

Author_Institution

Stanford Univ., CA, USA

fYear

2005

fDate

15-21 May 2005

Firstpage

8

Abstract

Summary form only given. Our ability to design and deploy large complex systems is outpacing our ability to understand their behavior. How do we detect and recover from "heisenbugs", which account for up to 40% of failures in complex Internet systems, without extensive application-specific coding? Which users were affected, and for how long? How do we diagnose and correct problems caused by configuration errors or operator errors? Although these problems are posed at a high level of abstraction, all we can usually measure directly are low-level behaviors - analogous to driving a car while looking through a magnifying glass. Machine learning can bridge this gap using techniques that learn "baseline" models automatically or semi-automatically, allowing the characterization and monitoring of systems whose structure is not well understood a priori. This paper discusses initial successes and future challenges in using machine learning for failure detection and diagnosis, configuration troubleshooting, attribution (which low-level properties appear to be correlated with an observed high-level effect such as decreased performance), and failure forecasting.

Keywords

learning (artificial intelligence); program diagnostics; software reliability; statistical analysis; configuration troubleshooting; machine learning; software dependability; software failure detection; software failure diagnosis; software failure forecasting; statistical techniques; system monitoring; Bridges; Computerized monitoring; Condition monitoring; Error correction; Glass; Internet; Machine learning;

fLanguage

English

Publisher

ieee

Conference_Titel

Software Engineering, 2005. ICSE 2005. Proceedings. 27th International Conference on

Print_ISBN

1-59593-963-2

Type

conf

DOI

10.1109/ICSE.2005.1553531

Filename

1553531