DocumentCode
3548966
Title
Multigrid Methods for Policy Evaluation and Reinforcement Learning
Author
Ziv, Omer ; Shimkin, Nahum
Author_Institution
Dept. of Electr. Eng., Technion, Haifa
fYear
2005
fDate
27-29 June 2005
Firstpage
1391
Lastpage
1396
Abstract
We introduce a new class of multigrid temporal-difference learning algorithms for speeding up the estimation of the value function related to a stationary policy, within the context of discounted cost Markov decision processes with linear functional approximation. The proposed scheme builds on the multi-grid framework which is used in numerical analysis to enhance the iterative solution of linear equations. We first apply the multigrid approach to policy evaluation in the known model case. We then extend this approach to the learning case, and propose a scheme in which the basic TD(lambda) learning algorithm is applied at various resolution scales. The efficacy of the proposed algorithms is demonstrated through simulation experiments
Keywords
Markov processes; differential equations; iterative methods; learning (artificial intelligence); optimal control; discounted cost Markov decision process; iterative solution; linear equations; linear functional approximation; multigrid method; multigrid temporal-difference learning algorithm; numerical analysis; policy evaluation; reinforcement learning; stationary policy; value function estimation; Computational complexity; Convergence; Dynamic programming; Equations; Error correction; Function approximation; Iterative algorithms; Learning; Multigrid methods; State-space methods;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Control, 2005. Proceedings of the 2005 IEEE International Symposium on, Mediterrean Conference on Control and Automation
Conference_Location
Limassol
ISSN
2158-9860
Print_ISBN
0-7803-8936-0
Type
conf
DOI
10.1109/.2005.1467218
Filename
1467218
Link To Document