• DocumentCode
    2832977
  • Title

    Decentralized Load Balancing for Improving Reliability in Heterogeneous Distributed Systems

  • Author

    Pezoa, Jorge E. ; Dhakal, Sagar ; Hayat, Majeed M.

  • Author_Institution
    Univ. of New Mexico, Albuquerque, NM, USA
  • fYear
    2009
  • fDate
    22-25 Sept. 2009
  • Firstpage
    214
  • Lastpage
    221
  • Abstract
    A probabilistic analytical framework for decentralized load balancing (LB) strategies for heterogeneous distributed-computing systems (DCSs) is presented with the overall goal of maximizing the service reliability in the presence of random failures. The service reliability of a DCS is defined as the probability of successfully serving a specified workload before all the computing nodes fail permanently. In the framework considered the service and failure times of nodes are random, the communication times in the network are both tangible and stochastic, and LB is performed synchronously by all the nodes during the runtime of each submitted workload. By taking a novel regenerative stochastic-analysis approach, the service reliability of a two-node DCS is characterized analytically. This formulation, in turn, is used to form and solve an optimization problem, yielding LB policies with maximal reliability. A scalable extension of the two-node formulation to an arbitrary size system is also presented. The validity of the proposed theory is studied using both Monte-Carlo simulations and real experiments on a small-scale testbed.
  • Keywords
    Monte Carlo methods; distributed processing; probability; queueing theory; resource allocation; software reliability; stochastic processes; Monte-Carlo simulations; decentralized load balancing; distributed computing system; heterogeneous distributed systems; optimization problem; probabilistic analytical framework; queuing theory; regenerative stochastic-analysis approach; service reliability; Concurrent computing; Distributed computing; Distributed control; Load management; Parallel processing; Reliability theory; Runtime; Stochastic processes; Telecommunication network reliability; Testing; distributed computing; load balancing; queuing theory; reliability; renewal theory;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing Workshops, 2009. ICPPW '09. International Conference on
  • Conference_Location
    Vienna
  • ISSN
    1530-2016
  • Print_ISBN
    978-1-4244-4923-1
  • Electronic_ISBN
    1530-2016
  • Type

    conf

  • DOI
    10.1109/ICPPW.2009.50
  • Filename
    5364246