DocumentCode
404687
Title
Policy gradient stochastic approximation algorithms for adaptive control of constrained time varying Markov decision processes
Author
Abad, Felisa J Vázquez ; Krishnamurthy, Vikaram
Author_Institution
Departement d´´Informatique et Recherche Oper., Montreal Univ., Que., Canada
Volume
3
fYear
2003
fDate
9-12 Dec. 2003
Firstpage
2823
Abstract
We present constrained stochastic approximation algorithms for computing the locally optimal policy of a constrained average cost finite state Markov decision process. The stochastic approximation algorithms require computation of the gradient of the cost function with respect to the parameter that characterizes the randomized policy. This is computed by novel simulation based gradient estimation schemes involving weak derivatives. The algorithms proposed are simulation based and do not require explicit knowledge of the underlying parameters such as transition probabilities. We present three classes of algorithms based on primal dual methods, augmented Lagrangian (multiplier) methods and gradient projection primal methods. Unlike neuro-dynamic programming methods such as Q-Learning, the algorithms proposed here can handle constraints and time varying parameters.
Keywords
Markov processes; adaptive control; approximation theory; constraint handling; decision theory; gradient methods; time-varying systems; adaptive control; augmented Lagrangian methods; average cost finite state Markov decision process; constrained time varying Markov decision processes; gradient estimation schemes; gradient projection primal methods; policy gradient stochastic approximation; weak derivatives; Adaptive control; Approximation algorithms; Computational modeling; Cost function; Kernel; Lagrangian functions; Optimal control; Robustness; State-space methods; Stochastic processes;
fLanguage
English
Publisher
ieee
Conference_Titel
Decision and Control, 2003. Proceedings. 42nd IEEE Conference on
ISSN
0191-2216
Print_ISBN
0-7803-7924-1
Type
conf
DOI
10.1109/CDC.2003.1273053
Filename
1273053
Link To Document