DocumentCode :
2995495
Title :
Achieving Robust Self-Management for Large-Scale Distributed Applications
Author :
Al-Shishtawy, Ahmad ; Fayyaz, Muhammad Asif ; Popov, Konstantin ; Vlassov, Vladimir
Author_Institution :
R. Inst. of Technol., Stockholm, Sweden
fYear :
2010
fDate :
Sept. 27 2010-Oct. 1 2010
Firstpage :
31
Lastpage :
40
Abstract :
Achieving self-management can be challenging, particularly in dynamic environments with resource churn (joins/leaves/failures). Dealing with the effect of churn on management increases the complexity of the management logic and thus makes its development time consuming and error prone. We propose the abstraction of robust management elements (RMEs), which are able to heal themselves under continuous churn. Using RMEs allows the developer to separate the issue of dealing with the effect of churn on management from the management logic. This facilitates the development of robust management by making the developer focus on managing the application while relying on the platform to provide the robustness of management. RMEs can be implemented as fault-tolerant long-living services. We present a generic approach and an associated algorithm to achieve fault-tolerant long-living services. Our approach is based on replicating a service using finite state machine replication with a reconfigurable replica set. Our algorithm automates the reconfiguration (migration) of the replica set in order to tolerate continuous churn. The algorithm uses P2P replica placement schemes to place replicas and uses the P2P overlay to monitor them. The replicated state machine is extended to analyze monitoring data in order to decide on when and where to migrate. We describe how to use our approach to achieve robust management elements. We present a simulation-based evaluation of our approach which shows its feasibility.
Keywords :
fault tolerant computing; finite state machines; peer-to-peer computing; P2P replica placement schemes; fault-tolerant long-living services; finite state machine replication; large-scale distributed applications; management logic; reconfigurable replica set; resource churn; robust management elements; robust self-management; Arrays; Computational modeling; Fault tolerance; Fault tolerant systems; Lead; Monitoring; Robustness; P2P; autonomic computing; distributed systems; replicated state machines; self-management; service migration;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Self-Adaptive and Self-Organizing Systems (SASO), 2010 4th IEEE International Conference on
Conference_Location :
Budapest
Print_ISBN :
978-1-4244-8537-6
Electronic_ISBN :
978-0-7695-4232-4
Type :
conf
DOI :
10.1109/SASO.2010.42
Filename :
5630643
Link To Document :
بازگشت