DocumentCode
391043
Title
Gradient-based policy iteration: an example
Author
Cao, Xi-Ren ; Fang, Hai-Tao
Author_Institution
Dept. of Electr. & Electron. Eng., Hong Kong Univ. of Sci. & Technol., China
Volume
3
fYear
2002
fDate
10-13 Dec. 2002
Firstpage
3367
Abstract
Research indicates that perturbation analysis (PA), Markov decision processes (MDP), and reinforcement learning (RL) are three closely-related areas in discrete event dynamic system optimization. In particular, it was shown that policy iteration in fact chooses the policy that has the steepest performance gradient (provided by PA) for the next iteration. This sensitivity point of view of MDP leads to some new research topics. We propose to implement policy iteration based on performance gradients. This approach is particularly useful when the actions at different states are correlated and hence the standard policy iteration cannot apply. We illustrate the main ideas with an example of an M/G/1/N queue and identify some further research topics.
Keywords
Markov processes; decision theory; discrete event systems; gradient methods; iterative methods; learning (artificial intelligence); probability; queueing theory; M/G/1/N queue; Markov decision processes; Q-learning; W-factors; discrete event dynamic system optimization; gradient-based policy iteration; performance gradients; perturbation analysis; reinforcement learning; sensitivity; Control systems; Convergence; Laboratories; Mathematics; Optimization; Performance analysis; Poisson equations; Stochastic processes; System performance; User-generated content;
fLanguage
English
Publisher
ieee
Conference_Titel
Decision and Control, 2002, Proceedings of the 41st IEEE Conference on
ISSN
0191-2216
Print_ISBN
0-7803-7516-5
Type
conf
DOI
10.1109/CDC.2002.1184395
Filename
1184395
Link To Document