Title :
A policy gradient method for SMDPs with application to call admission control
Author :
Singh, Sumeetpal ; Tadic, Vladislav ; Doucet, Arnand
Author_Institution :
Dept. of Electr. & Electron. Eng., Melbourne Univ., Parkville, Vic., Australia
Abstract :
Classical methods for solving a semi-Markov decision process such as value iteration and policy iteration require precise knowledge of the underlying probabilistic model and are know to suffer from the curse of dimensionality. To overcome both these limitations, this paper presents a reinforcement learning approach where one optimizes directly the performance criterion with respect to a family of parameterised policies. We propose an online algorithm that simultaneously estimates the gradient of the performance criterion and optimises it through stochastic approximation. The gradient estimator is based on the discounted score method as introduced. We demonstrate the utility of our algorithm in a Call Admission Control problem.
Keywords :
Markov processes; estimation theory; gradient methods; learning (artificial intelligence); optimisation; telecommunication congestion control; Markov decision process; call admission control problem; classical methods; dimensionality; gradient estimator; online algorithm; parameterised policies; performance criterion; policy gradient method; probabilistic model; reinforcement learning approach; stochastic approximation; value iteration; Call admission control; Convergence; Cost function; Gradient methods; Optimal control; Signal processing; Tiles; Tires;
Conference_Titel :
Control, Automation, Robotics and Vision, 2002. ICARCV 2002. 7th International Conference on
Print_ISBN :
981-04-8364-3
DOI :
10.1109/ICARCV.2002.1234955