• DocumentCode
    3124743
  • Title

    Automated Diagnosis of System Failures with Fa

  • Author

    Duan, Songyun ; Babu, Shivnath

  • Author_Institution
    Dept. of Comput. Sci., Duke Univ., Durham, NC
  • fYear
    2009
  • fDate
    March 29 2009-April 2 2009
  • Firstpage
    1499
  • Lastpage
    1502
  • Abstract
    While quick failure diagnosis and system recovery is critical, database and system administrators continue to struggle with this problem. The spectrum of possible causes of failure is huge: performance problems like resource contention, crashes due to hardware faults or software bugs, misconfiguration by system operators, and many others. The scale, complexity, and dynamics of modern systems make it laborious and time-consuming to track down the cause of failures manually. Conventional data-mining techniques like clustering and classification have a lot to offer to the hard problem of failure diagnosis. These techniques can be applied to the wealth of monitoring data that operational systems collect. However, some novel challenges need to be solved before these techniques can deliver an automated, efficient, and reasonably-accurate tool for diagnosing failures using monitoring data; a tool that is easy and intuitive to use. Fa is a new system for automated diagnosis of system failures that is designed to address the above challenges. When a system is running, Fa collects monitoring data periodically and stores it in a database.
  • Keywords
    data mining; fault diagnosis; fault tolerant computing; pattern classification; pattern clustering; system monitoring; system recovery; Fa tool; automated system failure diagnosis; data classification; data clustering; data-mining technique; database administration; hardware fault; performance problem; resource contention; software bug; system administration; system monitoring data; system recovery; Banking; Computer errors; Computer science; Computerized monitoring; Condition monitoring; Costs; Data engineering; Databases; Gaussian noise; Productivity;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2009. ICDE '09. IEEE 25th International Conference on
  • Conference_Location
    Shanghai
  • ISSN
    1084-4627
  • Print_ISBN
    978-1-4244-3422-0
  • Electronic_ISBN
    1084-4627
  • Type

    conf

  • DOI
    10.1109/ICDE.2009.118
  • Filename
    4812557