DocumentCode :
2832977
Title :
Decentralized Load Balancing for Improving Reliability in Heterogeneous Distributed Systems
Author :
Pezoa, Jorge E. ; Dhakal, Sagar ; Hayat, Majeed M.
Author_Institution :
Univ. of New Mexico, Albuquerque, NM, USA
fYear :
2009
fDate :
22-25 Sept. 2009
Firstpage :
214
Lastpage :
221
Abstract :
A probabilistic analytical framework for decentralized load balancing (LB) strategies for heterogeneous distributed-computing systems (DCSs) is presented with the overall goal of maximizing the service reliability in the presence of random failures. The service reliability of a DCS is defined as the probability of successfully serving a specified workload before all the computing nodes fail permanently. In the framework considered the service and failure times of nodes are random, the communication times in the network are both tangible and stochastic, and LB is performed synchronously by all the nodes during the runtime of each submitted workload. By taking a novel regenerative stochastic-analysis approach, the service reliability of a two-node DCS is characterized analytically. This formulation, in turn, is used to form and solve an optimization problem, yielding LB policies with maximal reliability. A scalable extension of the two-node formulation to an arbitrary size system is also presented. The validity of the proposed theory is studied using both Monte-Carlo simulations and real experiments on a small-scale testbed.
Keywords :
Monte Carlo methods; distributed processing; probability; queueing theory; resource allocation; software reliability; stochastic processes; Monte-Carlo simulations; decentralized load balancing; distributed computing system; heterogeneous distributed systems; optimization problem; probabilistic analytical framework; queuing theory; regenerative stochastic-analysis approach; service reliability; Concurrent computing; Distributed computing; Distributed control; Load management; Parallel processing; Reliability theory; Runtime; Stochastic processes; Telecommunication network reliability; Testing; distributed computing; load balancing; queuing theory; reliability; renewal theory;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing Workshops, 2009. ICPPW '09. International Conference on
Conference_Location :
Vienna
ISSN :
1530-2016
Print_ISBN :
978-1-4244-4923-1
Electronic_ISBN :
1530-2016
Type :
conf
DOI :
10.1109/ICPPW.2009.50
Filename :
5364246
Link To Document :
بازگشت