DocumentCode
2284620
Title
A state machine approach for problem detection in large-scale distributed system
Author
Sun, Kewei ; Qiu, Jie ; Li, Ying ; Chen, Ying ; Ji, Weixing
Author_Institution
IBM China Res. Lab., Beijing
fYear
2008
fDate
7-11 April 2008
Firstpage
317
Lastpage
324
Abstract
Efficient problem detection methods play an important role in system management. In this paper, a formal method is described for problem detection in large scale and distributed enterprise IT environment. Events from distributed system components are collected, filtered and correlated. Leveraging these correlated events, the behavior of a distributed system is presented as a problem detection state machine (PDSM). PDSM is built up automatically from system logs without any specification of the target system. This approach combines logs from multi-sources and does not require any human involved or experimental instructions. It is generally applicable to a large class of distributed systems. Experimental results show that the implementation of PDSM performs problem detection efficiently in typical distributed enterprise systems.
Keywords
distributed processing; finite state machines; virtual enterprises; distributed enterprise IT environment; distributed enterprise systems; distributed system components; formal method; large-scale distributed system; problem detection methods; problem detection state machine; system management; Computer science; Databases; Event detection; Laboratories; Large-scale systems; Middleware; Parallel processing; Pattern analysis; Quality of service; Sun; event correlation; log analysis; problem detection; state machine;
fLanguage
English
Publisher
ieee
Conference_Titel
Network Operations and Management Symposium, 2008. NOMS 2008. IEEE
Conference_Location
Salvador, Bahia
ISSN
1542-1201
Print_ISBN
978-1-4244-2065-0
Electronic_ISBN
1542-1201
Type
conf
DOI
10.1109/NOMS.2008.4575150
Filename
4575150
Link To Document