مرکز منطقه ای اطلاع رساني علوم و فناوري - Monte Carlo Tree Search for Bayesian Reinforcement Learning

DocumentCode :

589217

Title :

Monte Carlo Tree Search for Bayesian Reinforcement Learning

Author :

Ngo Anh Vien ; Ertel, Wolfgang

Author_Institution :

Inst. of Artificial Intell., Ravensburg-Weingarten Univ. of Appl. Sci., Weingarten, Germany

Volume :

fYear :

2012

fDate :

12-15 Dec. 2012

Firstpage :

138

Lastpage :

143

Abstract :

Bayesian model-based reinforcement learning can be formulated as a partially observable Markov decision process (POMDP) to provide a principled framework for optimally balancing exploitation and exploration. Then, a POMDP solver can be used to solve the problem. If the prior distribution over the environment´s dynamics is a product of Dirichlet distributions, the POMDP´s optimal value function can be represented using a set of multivariate polynomials. Unfortunately, the size of the polynomials grows exponentially with the problem horizon. In this paper, we examine the use of an online Monte-Carlo tree search (MCTS) algorithm for large POMDPs, to solve the Bayesian reinforcement learning problem online. We will show that such an algorithm successfully searches for a near-optimal policy. In addition, we examine the use of a parameter tying method to keep the model search space small, and propose the use of nested mixture of tied models to increase robustness of the method when our prior information does not allow us to specify the structure of tied models exactly. Experiments show that the proposed methods substantially improve scalability of current Bayesian reinforcement learning methods.

Keywords :

Markov processes; Monte Carlo methods; belief networks; learning (artificial intelligence); polynomials; statistical distributions; tree searching; Bayesian reinforcement learning; Dirichlet distribution; MCTS algorithm; Monte Carlo tree search; POMDP optimal value function; POMDP solver; environment dynamics; multivariate polynomial; near-optimal policy; partially observable Markov decision process; Bayesian methods; Computational modeling; History; Learning; Monte Carlo methods; Planning; Polynomials; Bayesian reinforcement learning; Monte-Carlo tree search; POMDP; model-based reinforcement learning;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Machine Learning and Applications (ICMLA), 2012 11th International Conference on

Conference_Location :

Boca Raton, FL

Print_ISBN :

978-1-4673-4651-1

Type :

conf

DOI :

10.1109/ICMLA.2012.30

Filename :

6406602

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=589217