DocumentCode
3128200
Title
Automatic Failure Diagnosis Support in Distributed Large-Scale Software Systems Based on Timing Behavior Anomaly Correlation
Author
Marwede, Nina ; Rohr, Matthias ; Van Hoorn, Andre ; Hasselbring, Wilhelm
Author_Institution
BTC Bus. Technol. Consulting AG, Oldenburg
fYear
2009
fDate
24-27 March 2009
Firstpage
47
Lastpage
58
Abstract
Manual failure diagnosis in large-scale software systems is time-consuming and error-prone. Automatic failure diagnosis support mechanisms can potentially narrow down, or even localize faults within a very short time which both helps to preserve system availability. A large class of automatic failure diagnosis approaches consists of two steps: 1) computation of component anomaly scores; 2) global correlation of the anomaly scores for fault localization. In this paper, we present an architecture-centric approach for the second step. In our approach, component anomaly scores are correlated based on architectural dependency graphs of the software system and a rule set to address error propagation. Moreover, the results are graphically visualized in order to support fault localization and to enhance maintainability. The visualization combines architectural diagrams automatically derived from monitoring data with failure diagnosis results. In a case study, the approach is applied to a distributed sample Web application which is subject to fault injection.
Keywords
distributed processing; software reliability; anomaly correlation; architectural dependency graphs; automatic failure diagnosis; distributed large-scale software systems; fault localization; manual failure diagnosis; system availability; timing behavior; Application software; Computerized monitoring; Data visualization; Fault detection; Fault diagnosis; Large-scale systems; Software engineering; Software maintenance; Software systems; Timing; anomaly correlation; dependency graphs; failure diagnosis; fault localization; performance analysis; response times; software faults;
fLanguage
English
Publisher
ieee
Conference_Titel
Software Maintenance and Reengineering, 2009. CSMR '09. 13th European Conference on
Conference_Location
Kaiserslautern
ISSN
1534-5351
Print_ISBN
978-0-7695-3589-0
Type
conf
DOI
10.1109/CSMR.2009.15
Filename
4812738
Link To Document