• DocumentCode
    754972
  • Title

    On the quality of service of failure detectors

  • Author

    Chen, Wei ; Toueg, Sam ; Aguilera, Marcos Kawazoe

  • Author_Institution
    Oracle Corp., Nashua, NH, USA
  • Volume
    51
  • Issue
    5
  • fYear
    2002
  • fDate
    5/1/2002 12:00:00 AM
  • Firstpage
    561
  • Lastpage
    580
  • Abstract
    We study the quality of service (QoS) of failure detectors. By QoS, we mean a specification that quantifies 1) how fast the failure detector detects actual failures and 2) how well it avoids false detections. We first propose a set of QoS metrics to specify failure detectors for systems with probabilistic behaviors, i.e., for systems where message delays and message losses follow some probability distributions. We then give a new failure detector algorithm and analyze its QoS in terms of the proposed metrics. We show that, among a large class of failure detectors, the new algorithm is optimal with respect to some of these QoS metrics. Given a set of failure detector QoS requirements, we show how to compute the parameters of our algorithm so that it satisfies these requirements and we show how this can be done,even if the probabilistic behavior of the system is not known. We then present some simulation results that show that the new failure detector algorithm provides a better QoS than an algorithm that is commonly used in practice. Finally, we suggest some ways to make our failure detector adaptive to changes in the probabilistic behavior of the network
  • Keywords
    distributed algorithms; fault tolerant computing; quality of service; QoS; QoS metrics; distributed algorithm; failure detectors; fault tolerance; probabilistic analysis; quality of service; Detectors; Quality of service;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2002.1004595
  • Filename
    1004595