Title :
Diagnosing architectural run-time failures
Author :
Casanova, Paulo ; Garlan, David ; Schmerl, Bradley ; Abreu, Rui
Author_Institution :
Sch. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA
Abstract :
Self-diagnosis is a fundamental capability of self-adaptive systems. In order to recover from faults, systems need to know which part is responsible for the incorrect behavior. In previous work we showed how to apply a design-time diagnosis technique at run time to identify faults at the architectural level of a system. Our contributions address three major shortcomings of our previous work: 1) we present an expressive, hierarchical language to describe system behavior that can be used to diagnose when a system is behaving different to expectation; the hierarchical language facilitates mapping low level system events to architecture level events; 2) we provide an automatic way to determine how much data to collect before an accurate diagnosis can be produced; and 3) we develop a technique that allows the detection of correlated faults between components. Our results are validated experimentally by injecting several failures in a system and accurately diagnosing them using our algorithm.
Keywords :
software architecture; software fault tolerance; architectural run-time failure diagnosis; design-time diagnosis technique; fault identification; hierarchical language; self-adaptive system; Cognition; Computational modeling; Databases; Fault diagnosis; Monitoring; Probes; Web servers;
Conference_Titel :
Software Engineering for Adaptive and Self-Managing Systems (SEAMS), 2013 ICSE Workshop on
Conference_Location :
San Francisco, CA
Print_ISBN :
978-1-4799-0344-3
DOI :
10.1109/SEAMS.2013.6595497