• DocumentCode
    120747
  • Title

    A pragmatic approach to predict hardware failures in storage systems using MPP database and big data technologies

  • Author

    Kumar, Ravindra ; Vijayakumar, Sethu ; Ahamed, Syed Azar

  • Author_Institution
    TATA Consultancy Services Ltd., Bangalore, India
  • fYear
    2014
  • fDate
    21-22 Feb. 2014
  • Firstpage
    779
  • Lastpage
    788
  • Abstract
    A storage system in a data center consists of various components such as Disk Array Enclosure (DAE), disks, processors, servers, hosts running different applications, and so on. Hard disk and server failures are not frequent but are often very costly. Such failures can have a very adverse effect on the business of an organization. The ability to accurately predict an impending disk or server failure can add an essential functionality for designing a reliable, fault tolerant and continuously available storage system. This paper explains a novel approach to predict hardware failures using spectrum-kernel Parallel Support Vector Machine (Parallel SVM) method by analyzing the system events logged in the system log files. These log files not only records the events processed by the system but it also holds the messages as the system state changes. A single message in the system log file is insufficient for any prediction and such prediction is bound to be less accurate. The approach introduced in the paper uses a sequence or pattern of messages from the system log file using a Sliding Window of messages with window size of 5 message sequence to predict the likelihood of a failure. These Sliding Windows of message sequences acts as inputs to the Parallel SVM. The Parallel SVM further tags the messages to a failure or non-failure system. Data Mining techniques are used in extracting useful information from the raw dataset. A solutioning model is developed using the structured dataset and Machine Learning algorithms. This environment when implemented using actual system logs from Linux-based storage system have shown to predict a hardware failure with accuracy of 90-92 percent.
  • Keywords
    Big Data; Linux; computer centres; data mining; hard discs; learning (artificial intelligence); parallel processing; support vector machines; system recovery; Data Mining techniques; Linux-based storage system; continuously available storage system; failure system; fault tolerant storage system; hard disk failure; hardware failure prediction; information extracting; machine learning algorithms; message pattern; message sequence; nonfailure system; parallel SVM; raw dataset; server failure; sliding window size; spectrum-kernel parallel support vector machine method; structured dataset; system log files; Conferences; Decision support systems; Handheld computers; Big Data Analytics; Cloud Computing; Hard disk & Server failure prediction; MPP Database; Machine Learning; Parallel SVM Classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advance Computing Conference (IACC), 2014 IEEE International
  • Conference_Location
    Gurgaon
  • Print_ISBN
    978-1-4799-2571-1
  • Type

    conf

  • DOI
    10.1109/IAdCC.2014.6779422
  • Filename
    6779422