Author/Authors :
Yang,Zhe School of Computer Science -Wuhan University, Wuhan, China , Ying , Shi School of Computer Science -Wuhan University, Wuhan, China , Wang, Bingming School of Computer Science -Wuhan University, Wuhan, China , Li, Yiyao School of Software Engineering - Tongji University, Shanghai, China , Dong, Bo School of Computer Science -Wuhan University, Wuhan, China , Geng, Jiangyi School of Computer Science -Wuhan University, Wuhan, China , Zhang, Ting School of Computer Science -Wuhan University, Wuhan, China
Abstract :
The log analysis-based system fault diagnosis method can help engineers analyze the fault events generated by the system. The K-means algorithm can perform log analysis well and does not require a lot of prior knowledge, but the K-means-based system fault diagnosis method needs to be improved in both efficiency and accuracy. To solve this problem, we propose a system fault diagnosis method based on a reclustering algorithm. First, we propose a log vectorization method based on the PV-DM language model to obtain low-dimensional log vectors which can provide effective data support for the subsequent fault diagnosis; then, we improve the K-means algorithm and make the effect of K-means algorithm based log clustering; finally, we propose a reclustering method based on keywords’ extraction to improve the accuracy of fault diagnosis. We use system log data generated by two supercomputers to verify our method. The experimental results show that compared with the traditional K-means method, our method can improve the accuracy of fault diagnosis while ensuring the efficiency of fault diagnosis.
Keywords :
Algorithm , Reclustering Algorithm , Diagnosis Method , A System Fault