• DocumentCode
    2114750
  • Title

    A policy gradient method for SMDPs with application to call admission control

  • Author

    Singh, Sumeetpal ; Tadic, Vladislav ; Doucet, Arnand

  • Author_Institution
    Dept. of Electr. & Electron. Eng., Melbourne Univ., Parkville, Vic., Australia
  • Volume
    3
  • fYear
    2002
  • fDate
    2-5 Dec. 2002
  • Firstpage
    1268
  • Abstract
    Classical methods for solving a semi-Markov decision process such as value iteration and policy iteration require precise knowledge of the underlying probabilistic model and are know to suffer from the curse of dimensionality. To overcome both these limitations, this paper presents a reinforcement learning approach where one optimizes directly the performance criterion with respect to a family of parameterised policies. We propose an online algorithm that simultaneously estimates the gradient of the performance criterion and optimises it through stochastic approximation. The gradient estimator is based on the discounted score method as introduced. We demonstrate the utility of our algorithm in a Call Admission Control problem.
  • Keywords
    Markov processes; estimation theory; gradient methods; learning (artificial intelligence); optimisation; telecommunication congestion control; Markov decision process; call admission control problem; classical methods; dimensionality; gradient estimator; online algorithm; parameterised policies; performance criterion; policy gradient method; probabilistic model; reinforcement learning approach; stochastic approximation; value iteration; Call admission control; Convergence; Cost function; Gradient methods; Optimal control; Signal processing; Tiles; Tires;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Control, Automation, Robotics and Vision, 2002. ICARCV 2002. 7th International Conference on
  • Print_ISBN
    981-04-8364-3
  • Type

    conf

  • DOI
    10.1109/ICARCV.2002.1234955
  • Filename
    1234955