DocumentCode :
2716060
Title :
Efficient Failure Detection and Recovery Scheme for Hierarchical Distributed Monitoring
Author :
Ahn, Jinho
Author_Institution :
Kyonggi Univ., Suwon
Volume :
2
fYear :
2007
fDate :
6-8 Dec. 2007
Firstpage :
510
Lastpage :
515
Abstract :
When developing networked or distributed systems, network monitoring is becoming an essential facility for controlling and managing their performance or quality of service. Especially as their network rapidly scales up, distributed monitoring schemes based on a hierarchy of monitoring managers has been presented and used. But, failures of some monitoring managers cause managed network elements not to be continuously and correctly polled until the managers are repaired. For this purpose, this paper proposes an efficient monitoring manager fault-tolerance scheme to enable the managers to effectively exploit their hierarchical structure. The scheme results in low failure detection overhead by each monitoring manager periodically sending a manager advertisement message only to its immediate super manager. Therefore, even if some managers crash concurrently, the scheme allows their immediate super managers to take over them. This behavior can achieve minimizing the number of live managers affected by the failures. Moreover, after failed managers have been recovered, it allows them to immediately play their pre-failure roles in order to improve entire monitoring system performance degraded by the failures.
Keywords :
computer network management; quality of service; system recovery; telecommunication network reliability; failure detection; hierarchical distributed monitoring; monitoring manager fault-tolerance; network monitoring; quality of service; recovery scheme; Computer crashes; Computer science; Computerized monitoring; Condition monitoring; Control systems; Fault tolerance; Grid computing; Information management; Peer to peer computing; Telecommunication traffic;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Future Generation Communication and Networking (FGCN 2007)
Conference_Location :
Jeju
Print_ISBN :
0-7695-3048-6
Type :
conf
DOI :
10.1109/FGCN.2007.114
Filename :
4426294
Link To Document :
بازگشت