Two Online Learning Playout Policies in Monte Carlo Go: An Application of Win/Loss States

Author

Basaldua, Jacques ; Stewart, Steven ; Moreno-Vega, J. Marcos ; Drake, Peter D.

Author_Institution

Dept. de Estadistica IO y Comput., Univ. de La Laguna, La Laguna, Spain

Volume

6

Issue

1

fYear

2014

fDate

Mar-14

Firstpage

46

Lastpage

54

Abstract

Recently, Monte Carlo tree search (MCTS) has become the dominant algorithm in Computer Go. This paper compares two simulation algorithms known as playout policies. The base policy includes some mandatory domain-specific knowledge such as seki and urgency patterns, but is still simple to implement. The more advanced learning policy combines two different learning algorithms with those implemented in the base policy. This policy makes use of win/loss states (WLSs) to learn win rates for large sets of features. A very large experimental series of 7960 games includes results for different board sizes, in self-play and against a reference opponent: Fuego. Results are given for equal numbers of simulations and equal central processing unit (CPU) allocation. The improvement is around 100 Elo points, even with equal CPU allocation, and it increases with the number of simulations. Analyzing the proportion of moves generated by each part of the policy and the individual impact of each part provides further insight on how the policy is learning.

Keywords

Monte Carlo methods; computer games; learning (artificial intelligence); tree searching; CPU; Elo points; FUEGO; MCTS; Monte Carlo Go; Monte Carlo tree search; central processing unit allocation; computer Go; learning algorithms; learning policy; mandatory domain-specific knowledge; online learning playout policies; seki patterns; urgency patterns; win-loss states; Computational modeling; Context; Games; Monte Carlo methods; Resource management; Shape; Tracking; Knowledge discovery; Monte Carlo methods; statistical learning; stochastic systems;

fLanguage

English

Journal_Title

Computational Intelligence and AI in Games, IEEE Transactions on

Publisher

ieee

ISSN

1943-068X

Type

jour

DOI

10.1109/TCIAIG.2013.2292565

Filename

6675777