DocumentCode :
3124743
Title :
Automated Diagnosis of System Failures with Fa
Author :
Duan, Songyun ; Babu, Shivnath
Author_Institution :
Dept. of Comput. Sci., Duke Univ., Durham, NC
fYear :
2009
fDate :
March 29 2009-April 2 2009
Firstpage :
1499
Lastpage :
1502
Abstract :
While quick failure diagnosis and system recovery is critical, database and system administrators continue to struggle with this problem. The spectrum of possible causes of failure is huge: performance problems like resource contention, crashes due to hardware faults or software bugs, misconfiguration by system operators, and many others. The scale, complexity, and dynamics of modern systems make it laborious and time-consuming to track down the cause of failures manually. Conventional data-mining techniques like clustering and classification have a lot to offer to the hard problem of failure diagnosis. These techniques can be applied to the wealth of monitoring data that operational systems collect. However, some novel challenges need to be solved before these techniques can deliver an automated, efficient, and reasonably-accurate tool for diagnosing failures using monitoring data; a tool that is easy and intuitive to use. Fa is a new system for automated diagnosis of system failures that is designed to address the above challenges. When a system is running, Fa collects monitoring data periodically and stores it in a database.
Keywords :
data mining; fault diagnosis; fault tolerant computing; pattern classification; pattern clustering; system monitoring; system recovery; Fa tool; automated system failure diagnosis; data classification; data clustering; data-mining technique; database administration; hardware fault; performance problem; resource contention; software bug; system administration; system monitoring data; system recovery; Banking; Computer errors; Computer science; Computerized monitoring; Condition monitoring; Costs; Data engineering; Databases; Gaussian noise; Productivity;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2009. ICDE '09. IEEE 25th International Conference on
Conference_Location :
Shanghai
ISSN :
1084-4627
Print_ISBN :
978-1-4244-3422-0
Electronic_ISBN :
1084-4627
Type :
conf
DOI :
10.1109/ICDE.2009.118
Filename :
4812557
Link To Document :
بازگشت