مرکز منطقه ای اطلاع رساني علوم و فناوري - Bandits all the way down: UCB1 as a simulation policy in Monte Carlo Tree Search

DocumentCode :

3546907

Title :

Bandits all the way down: UCB1 as a simulation policy in Monte Carlo Tree Search

Author :

Powley, Edward J. ; Whitehouse, Daniel ; Cowling, Peter I.

Author_Institution :

Dept. of Comput. Sci., Univ. of York, York, UK

fYear :

2013

fDate :

11-13 Aug. 2013

Firstpage :

Lastpage :

Abstract :

Monte Carlo Tree Search (MCTS) is a family of asymmetric anytime aheuristic game tree search algorithms which have advanced the state-of-the-art in several challenging domains. MCTS learns a playout policy, iteratively building a partial tree to store and further refine the learned portion of the policy. When the playout leaves the existing tree, it falls back to a default simulation policy, which for many variants of MCTS chooses actions uniformly at random. This paper investigates how a simulation policy can be learned during the search, helping the playout policy remain plausible from root to terminal state without the injection of prior knowledge. Since the simulation policy visits states that are previously unseen, its decisions cannot be as context sensitive as those in the tree policy. We consider the well-known Move-Average Sampling Technique (MAST), which learns a value for each move which is independent of context. We also introduce a generalisation of MAST, called N-gram-Average-Sampling-Technique (NAST), which uses as context a fixed-lengthsequence (or N-tuple) of recent moves. We compare several policies for selecting moves during simulation, including the UCB1 policy for multi-armed bandits (as used in the tree policy for the popular UCT variant of MCTS). In addition to the elegance of treating the entire playout as a series of multi-armed bandit problems, we find that UCB1 gives consistently strong performance. We present empirical results for three games of imperfect information, namely the card games Dou Di Zhu and Hearts and the board game Lord Of The Rings: The Confrontation, each of which has its own unique challenges for search-based AI.

Keywords :

Monte Carlo methods; artificial intelligence; computer games; digital simulation; sampling methods; tree searching; Dou Di Zhu; Hearts; Lord Of The Rings: The Confrontation; MAST; MCTS; Monte Carlo tree search; NAST; UCB1; asymmetric anytime aheuristic game tree search algorithms; board game; card games; fixed-length sequence; imperfect information; move-average sampling technique; multiarmed bandit problems; n-gram-average-sampling-technique; partial tree; playout policy; root state; search-based AI; simulation policy; terminal state; Artificial intelligence; Context; Games; Heart; Law; Monte Carlo methods; Wheels;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computational Intelligence in Games (CIG), 2013 IEEE Conference on

Conference_Location :

Niagara Falls, ON

ISSN :

2325-4270

Print_ISBN :

978-1-4673-5308-3

Type :

conf

DOI :

10.1109/CIG.2013.6633613

Filename :

6633613

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3546907