Title :
Identifying missed monitoring alerts based on unstructured incident tickets
Author :
Liang Tang ; Tao Li ; Shwartz, Larisa ; Grabarnik, Genady Ya
Author_Institution :
Sch. of Comput. Sci., Florida Int. Univ., Miami, FL, USA
Abstract :
Automatic system monitoring is an efficient and reliable mean for problem detection in enterprise IT infrastructures. The performance of monitoring systems depends on their configurations specified by the system administrators. In dynamic and large IT environments, the IT infrastructures are frequently changed to meet various business requirements, so the configurations may not be always consistent with the updated status. Misconfigurations can lead to false positive (false alarms) and false negative (missing alerts) for the system administrators. The false negatives can cause serious system faults. This paper presents an automatic approach for discovering the false negatives from incident tickets that are created by humans. The discovered results help the system administrators correct the misconfigurations and minimize the false negatives in future. This approach applies a text classification model for analyzing the descriptions of incident tickets and identifying the corresponding system issues. The domain knowledge for describing those issues can be incorporated to assist with this model. Experiments are conducted on real system incident tickets from a large enterprise IT infrastructure. The experimental results demonstrate the effectiveness of the proposed approach.
Keywords :
business data processing; configuration management; pattern classification; software fault tolerance; system monitoring; text analysis; automatic system monitoring; business requirements; domain knowledge; dynamic IT environments; enterprise IT infrastructures; false alarms; false negative; large IT environments; misconfigurations; missed monitoring alerts identification; missing alerts; monitoring systems performance; problem detection; system administrators; system faults; text classification model; unstructured incident tickets; Accuracy; Manuals; Monitoring; Servers; Support vector machines; Training; Training data;
Conference_Titel :
Network and Service Management (CNSM), 2013 9th International Conference on
Conference_Location :
Zurich
DOI :
10.1109/CNSM.2013.6727825