• DocumentCode
    3349224
  • Title

    A dynamic replica selection algorithm for tolerating timing faults

  • Author

    Krishnamurthy, Sudha ; Sanders, William H. ; Cukier, Michel

  • Author_Institution
    Center for Reliable & High Performance Comput., Illinois Univ., Urbana, IL, USA
  • fYear
    2001
  • fDate
    1-4 July 2001
  • Firstpage
    107
  • Lastpage
    116
  • Abstract
    Server replication is commonly used to improve the fault tolerance and response time of distributed services. An important problem when executing time-critical applications in a replicated environment is that of preventing timing failures by dynamically selecting the replicas that can satisfy a client´s timing requirement, even when the quality of service is degraded due to replica failures and excess load on the server. We describe the approach we have used to solve this problem in AQuA, a CORBA-based middleware that transparently replicates objects across a local area network. The approach we use estimates a replica´s response time distribution based on performance measurements regularly broadcast by the replica. An online model uses these measurements to predict the probability with which a replica can prevent a timing failure for a client. A selection algorithm then uses this prediction to choose a subset of replicas that can together meet the client´s timing constraints with at least the probability requested by the client. We conclude with experimental results based on our implementation.
  • Keywords
    client-server systems; distributed object management; fault tolerant computing; local area networks; quality of service; AQuA; CORBA-based middleware; client; distributed services; dynamic replica selection algorithm; local area network; quality of service; replica failures; response time; server replication; time-critical applications; timing failures; timing fault tolerance; Degradation; Delay; Fault tolerance; Heuristic algorithms; Local area networks; Middleware; Network servers; Quality of service; Time factors; Timing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Dependable Systems and Networks, 2001. DSN 2001. International Conference on
  • Conference_Location
    Goteborg, Sweden
  • Print_ISBN
    0-7695-1101-5
  • Type

    conf

  • DOI
    10.1109/DSN.2001.941397
  • Filename
    941397