Title :
System anomaly detection in distributed systems through MapReduce-Based log analysis
Author :
Liu, Yan ; Pan, Wei ; Cao, Ning ; Qiao, Guangwei
Author_Institution :
Ideal Inst. of Inf. & Technol., Northeast Normal Univ., Changchun, China
Abstract :
System anomaly detection is very important for development, maintenance and performance refinement in large scale distributed systems. It´s a good way to obtain the troubleshooting and problem diagnosis by analyzing system logs produced by distributed systems. However, due to the increasing scale and complexity of distributed systems, the size of logs must be very large. Thus, it´s inefficient for common methods to analyze system logs on single node. Therefore, there is a great demand to adopt a distributed method for anomaly detection techniques based on log analysis. In this paper, a MapReduce-Based Framework is implemented to analyze the distributed log for detecting anomaly. The framework is built on top of Hadoop, an open source distributed file system and MapReduce implementation. We first make use of Random Access File to realize an incremental way for aggregating system logs from each node of the monitored cluster, and collect them to the analysis cluster. Then, we apply the K-means clustering algorithm to integrate the collected logs. After that, we implement a MapReduce-Based algorithm to parser these clustered log files. Furthermore, in order to make the best use of this collected data, a flexible and powerful way is utilized to display monitoring and analysis results. Thus, we can monitor system status of large distributed cluster and detect its anomalies.
Keywords :
distributed processing; pattern clustering; public domain software; security of data; software maintenance; system monitoring; Hadoop; K-means clustering algorithm; MapReduce-based log analysis; open source distributed file system; problem diagnosis; random access file; system anomaly detection; Algorithm design and analysis; Clustering algorithms; Data mining; Distributed databases; Graphical user interfaces; Monitoring; Programming; K-means; MapReduce; anomaly detection; distributed system; log analysis;
Conference_Titel :
Advanced Computer Theory and Engineering (ICACTE), 2010 3rd International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4244-6539-2
DOI :
10.1109/ICACTE.2010.5579173