DocumentCode :
3548966
Title :
Multigrid Methods for Policy Evaluation and Reinforcement Learning
Author :
Ziv, Omer ; Shimkin, Nahum
Author_Institution :
Dept. of Electr. Eng., Technion, Haifa
fYear :
2005
fDate :
27-29 June 2005
Firstpage :
1391
Lastpage :
1396
Abstract :
We introduce a new class of multigrid temporal-difference learning algorithms for speeding up the estimation of the value function related to a stationary policy, within the context of discounted cost Markov decision processes with linear functional approximation. The proposed scheme builds on the multi-grid framework which is used in numerical analysis to enhance the iterative solution of linear equations. We first apply the multigrid approach to policy evaluation in the known model case. We then extend this approach to the learning case, and propose a scheme in which the basic TD(lambda) learning algorithm is applied at various resolution scales. The efficacy of the proposed algorithms is demonstrated through simulation experiments
Keywords :
Markov processes; differential equations; iterative methods; learning (artificial intelligence); optimal control; discounted cost Markov decision process; iterative solution; linear equations; linear functional approximation; multigrid method; multigrid temporal-difference learning algorithm; numerical analysis; policy evaluation; reinforcement learning; stationary policy; value function estimation; Computational complexity; Convergence; Dynamic programming; Equations; Error correction; Function approximation; Iterative algorithms; Learning; Multigrid methods; State-space methods;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Control, 2005. Proceedings of the 2005 IEEE International Symposium on, Mediterrean Conference on Control and Automation
Conference_Location :
Limassol
ISSN :
2158-9860
Print_ISBN :
0-7803-8936-0
Type :
conf
DOI :
10.1109/.2005.1467218
Filename :
1467218
Link To Document :
بازگشت