DocumentCode
3538685
Title
Monte-Carlo utility estimates for Bayesian reinforcement learning
Author
Dimitrakakis, Christos
Author_Institution
Chalmers Univ. of Technol., Gothenburg, Sweden
fYear
2013
fDate
10-13 Dec. 2013
Firstpage
7303
Lastpage
7308
Abstract
This paper discusses algorithms for Monte-Carlo Bayesian reinforcement learning. Firstly, Monte-Carlo estimates of upper bounds on the Bayes-optimal value function are used to construct an optimistic policy. Secondly, gradient-based algorithms for approximate bounds are introduced. Finally, a new class of gradient algorithms for Bayesian Bellman error minimisation is proposed. Theoretically, it is shown that the gradient methods are sound. Experiments demonstrate the superiority of the upper bound method in terms of reward obtained. However, the Bayesian Bellman error method is a close second, despite its computational simplicity.
Keywords
Monte Carlo methods; belief networks; gradient methods; learning (artificial intelligence); utility theory; Bayes-optimal value function; Bayesian Bellman error minimisation; Bayesian reinforcement learning; Monte-Carlo utility estimates; approximate bounds; computational simplicity; gradient-based algorithms; optimistic policy; upper bound estimation; Computational modeling;
fLanguage
English
Publisher
ieee
Conference_Titel
Decision and Control (CDC), 2013 IEEE 52nd Annual Conference on
Conference_Location
Firenze
ISSN
0743-1546
Print_ISBN
978-1-4673-5714-2
Type
conf
DOI
10.1109/CDC.2013.6761048
Filename
6761048
Link To Document