DocumentCode
2393542
Title
The scheme design of distributed systems service fault management based on active probing
Author
Deng, Li ; Qu, Xiaoyan ; Ma, Dengwu
Author_Institution
Dept. of Armament Sci. & Technol., Naval Aeronaut. & Astronaut. Univ., Yantai, China
fYear
2012
fDate
19-20 May 2012
Firstpage
1644
Lastpage
1649
Abstract
Service fault management in distributed computer systems and networks is a difficult task that requires high efficient inferences from mass data. In this paper, we propose a corresponding solution. Firstly, challenges of distributed systems service fault management are analyzed, and a multilayer model is recommended. Then, a dependency matrix to represent the causal relationship between faults and probes is defined and the framework of fault management is built. After these, a service fault management scheme using active probing is proposed. This scheme is composed of two phases: fault detection and fault localization. In first phase, we propose a probe selection algorithm, which selects a minimal set of probes while remaining a high probability of fault detection. In second phase, we propose a fault localization probe selection algorithm, which selects probes to obtain more system information based on the symptoms observed in previous phase. Finally, the instance proves the validity and efficiency of our scheme.
Keywords
distributed processing; inference mechanisms; software fault tolerance; active probing; dependency matrix; distributed systems service fault management; fault localization; inferences; multilayer model; probe selection algorithm; Algorithm design and analysis; Fault detection; Monitoring; Nonhomogeneous media; Probes; Quality of service; Software; active probing; distributed systems; fault management; service managemet;
fLanguage
English
Publisher
ieee
Conference_Titel
Systems and Informatics (ICSAI), 2012 International Conference on
Conference_Location
Yantai
Print_ISBN
978-1-4673-0198-5
Type
conf
DOI
10.1109/ICSAI.2012.6223356
Filename
6223356
Link To Document