DocumentCode :
3145987
Title :
Predicting Node Failure in High Performance Computing Systems from Failure and Usage Logs
Author :
Nakka, Nithin ; Agrawal, Ankit ; Choudhary, Alok
Author_Institution :
Coordinated Sci. Lab., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
fYear :
2011
fDate :
16-20 May 2011
Firstpage :
1557
Lastpage :
1566
Abstract :
In this paper, we apply data mining classification schemes to predict failures in a high performance computer system. Failure and Usage data logs collected on supercomputing clusters at Los Alamos National Laboratory (LANL) were used to extract instances of failure information. For each failure instance, past and future failure information is accumulated -- time of usage, system idle time, time of unavailability, time since last failure, time to next failure. We performed two separate analyses, with and without classifying the failures based on their root cause. Based on this data, we applied some popular decision tree classifiers to predict if a failure would occur within 1 hour. Our experiments show that our prediction system predicts failures with a high-degree of precision up to 73% and recall of about 80%. We also observed that employing the usage data along with the failure data has improved the accuracy of prediction.
Keywords :
data mining; decision trees; pattern classification; system recovery; data mining classification; decision tree classifier; failure data log; failure information; high performance computing system; node failure; prediction system; supercomputing cluster; system idle time; unavailability time; usage data log; Data mining; Databases; Hardware; Humans; Maintenance engineering; Program processors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
Conference_Location :
Shanghai
ISSN :
1530-2075
Print_ISBN :
978-1-61284-425-1
Electronic_ISBN :
1530-2075
Type :
conf
DOI :
10.1109/IPDPS.2011.310
Filename :
6009015
Link To Document :
بازگشت