DocumentCode
3124743
Title
Automated Diagnosis of System Failures with Fa
Author
Duan, Songyun ; Babu, Shivnath
Author_Institution
Dept. of Comput. Sci., Duke Univ., Durham, NC
fYear
2009
fDate
March 29 2009-April 2 2009
Firstpage
1499
Lastpage
1502
Abstract
While quick failure diagnosis and system recovery is critical, database and system administrators continue to struggle with this problem. The spectrum of possible causes of failure is huge: performance problems like resource contention, crashes due to hardware faults or software bugs, misconfiguration by system operators, and many others. The scale, complexity, and dynamics of modern systems make it laborious and time-consuming to track down the cause of failures manually. Conventional data-mining techniques like clustering and classification have a lot to offer to the hard problem of failure diagnosis. These techniques can be applied to the wealth of monitoring data that operational systems collect. However, some novel challenges need to be solved before these techniques can deliver an automated, efficient, and reasonably-accurate tool for diagnosing failures using monitoring data; a tool that is easy and intuitive to use. Fa is a new system for automated diagnosis of system failures that is designed to address the above challenges. When a system is running, Fa collects monitoring data periodically and stores it in a database.
Keywords
data mining; fault diagnosis; fault tolerant computing; pattern classification; pattern clustering; system monitoring; system recovery; Fa tool; automated system failure diagnosis; data classification; data clustering; data-mining technique; database administration; hardware fault; performance problem; resource contention; software bug; system administration; system monitoring data; system recovery; Banking; Computer errors; Computer science; Computerized monitoring; Condition monitoring; Costs; Data engineering; Databases; Gaussian noise; Productivity;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering, 2009. ICDE '09. IEEE 25th International Conference on
Conference_Location
Shanghai
ISSN
1084-4627
Print_ISBN
978-1-4244-3422-0
Electronic_ISBN
1084-4627
Type
conf
DOI
10.1109/ICDE.2009.118
Filename
4812557
Link To Document