• DocumentCode
    2145003
  • Title

    A Reinforcement-Learning Approach to Failure-Detection Scheduling

  • Author

    Zeng, Fancong

  • Author_Institution
    BEA Syst., Inc., Liberty Corner
  • fYear
    2007
  • fDate
    11-12 Oct. 2007
  • Firstpage
    161
  • Lastpage
    170
  • Abstract
    A failure-detection scheduler for an online production system must strike a tradeoff between performance and reliability. If failure-detection processes are run too frequently, valuable system resources are spent checking and rechecking for failures. However, if failure-detection processes are run too rarely, a failure can remain undetected for a long time. In both cases, system performability suffers. We present a model-based learning approach that estimates the failure rate and then performs an optimization to find the tradeoff that maximizes system performability. We show that our approach is not only theoretically sound but practically effective, and we demonstrate its use in an implemented automated deadlock-detection system for Java.
  • Keywords
    decision theory; learning (artificial intelligence); scheduling; software reliability; system recovery; Java; automated deadlock detection; decision theory; failure detection scheduling; online production system; optimization; reinforcement learning; software reliability; Convergence; Costs; Exponential distribution; Frequency; Java; Production systems; Scheduling; System performance; System recovery; Upper bound;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Quality Software, 2007. QSIC '07. Seventh International Conference on
  • Conference_Location
    Portland, OR
  • ISSN
    1550-6002
  • Print_ISBN
    978-0-7695-3035-2
  • Type

    conf

  • DOI
    10.1109/QSIC.2007.4385492
  • Filename
    4385492