DocumentCode :
3279300
Title :
Predicting job completion times using system logs in supercomputing clusters
Author :
Xin Chen ; Charng-Da Lu ; Pattabiraman, Karthik
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of British Columbia, Vancouver, BC, Canada
fYear :
2013
fDate :
24-27 June 2013
Firstpage :
1
Lastpage :
8
Abstract :
Most large systems such as HPC/cloud computing clusters and data centers are built from commercial off-the-shelf components. System logs are usually the main source of choice to gain insights into the system issues. Therefore, mining logs to diagnose anomalies has been an active research area. Due to the lack of organization and semantic consistency in commodity PC clusters´ logs, what constitutes a fault or an error is subjective and thus building an automatic failure prediction model from log messages is hard. In this paper we sidestep the difficulty by asking a different question: Given the concomitant system log messages of a running job, can we predict the job´s remaining time? We adopt Hidden Markov Model (HMM) coupled with frequency analysis to achieve this. Our HMM approach can predict 75% of jobs´ remaining times with an error of less than 200 seconds.
Keywords :
cloud computing; computer centres; data mining; fault tolerant computing; hidden Markov models; parallel machines; security of data; software packages; system monitoring; HMM; HPC clusters; anomaly diagnosis; automatic failure prediction model; cloud computing clusters; commercial off-the-shelf components; commodity PC clusters logs; data centers; hidden Markov model; job completion time prediction; lack-of-organization; lack-of-semantic consistency; log mining; supercomputing clusters; system logs; Absorption; Computational modeling; Hardware; Hidden Markov models; Kernel; Markov processes; Training; Hidden Markov Model; Log Analysis; Prediction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Dependable Systems and Networks Workshop (DSN-W), 2013 43rd Annual IEEE/IFIP Conference on
Conference_Location :
Budapest
ISSN :
2325-6648
Type :
conf
DOI :
10.1109/DSNW.2013.6615513
Filename :
6615513
Link To Document :
بازگشت