DocumentCode :
2990228
Title :
A dependable system based on adaptive monitoring and replication
Author :
Matsumoto, Keinosuke ; Tanimoto, Akifumi ; Mori, Naoki
Author_Institution :
Grad. Sch. of Eng., Osaka Prefecture Univ., Sakai, Japan
fYear :
2011
fDate :
4-8 July 2011
Firstpage :
336
Lastpage :
342
Abstract :
A multi agent system (MAS) has recently gained public attention as a method to solve competition and cooperation in distributed systems. However, MAS´s vulnerability due to the propagation of failures prevents it from being applied to a large-scale system. This paper proposes a method to improve the reliability and efficiency of distributed systems. Specifically, the paper deals with the issue of fault tolerance. Distributed systems are characterized by a large number of agents, who interact according to complex patterns. The effects of a localized failure may spread across the whole network, depending on the structure of the interdependences between agents. The method monitors messages between agents to detect undesirable behaviors such as failures. Collecting the information, the method generates global information of interdependence between agents and expresses it in a graph. This interdependence graph enables us to detect or predict undesirable behaviors. This paper also shows that the method can optimize performance of a MAS and improve adoptively its reliability under complicated and dynamic environment by applying the global information acquired from the interdependence graph to a replication system. The advantages of the proposed method are illustrated through simulation experiments based on a virtual auction market.
Keywords :
distributed processing; graph theory; multi-agent systems; software fault tolerance; system monitoring; adaptive monitoring; dependable system; distributed systems; dynamic environment; fault tolerance; interdependence graph; large-scale system; multiagent system; reliability; replication system; Adaptation models; Adaptive systems; Fault tolerance; Fault tolerant systems; Monitoring; Servers; Agent and Multi Agent Based Applications; Fault Tolerance; Network Flow and Congestion; Reliable Parallel and Distributed Algorithms; Replication;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing and Simulation (HPCS), 2011 International Conference on
Conference_Location :
Istanbul
Print_ISBN :
978-1-61284-380-3
Type :
conf
DOI :
10.1109/HPCSim.2011.5999843
Filename :
5999843
Link To Document :
بازگشت