Title :
Monte-Carlo utility estimates for Bayesian reinforcement learning
Author :
Dimitrakakis, Christos
Author_Institution :
Chalmers Univ. of Technol., Gothenburg, Sweden
Abstract :
This paper discusses algorithms for Monte-Carlo Bayesian reinforcement learning. Firstly, Monte-Carlo estimates of upper bounds on the Bayes-optimal value function are used to construct an optimistic policy. Secondly, gradient-based algorithms for approximate bounds are introduced. Finally, a new class of gradient algorithms for Bayesian Bellman error minimisation is proposed. Theoretically, it is shown that the gradient methods are sound. Experiments demonstrate the superiority of the upper bound method in terms of reward obtained. However, the Bayesian Bellman error method is a close second, despite its computational simplicity.
Keywords :
Monte Carlo methods; belief networks; gradient methods; learning (artificial intelligence); utility theory; Bayes-optimal value function; Bayesian Bellman error minimisation; Bayesian reinforcement learning; Monte-Carlo utility estimates; approximate bounds; computational simplicity; gradient-based algorithms; optimistic policy; upper bound estimation; Computational modeling;
Conference_Titel :
Decision and Control (CDC), 2013 IEEE 52nd Annual Conference on
Conference_Location :
Firenze
Print_ISBN :
978-1-4673-5714-2
DOI :
10.1109/CDC.2013.6761048