مرکز منطقه ای اطلاع رساني علوم و فناوري - Coevolution versus self-play temporal difference learning for acquiring position evaluation in small-board go

DocumentCode :

1244324

Title :

Coevolution versus self-play temporal difference learning for acquiring position evaluation in small-board go

Author :

Runarsson, Thomas Philip ; Lucas, Simon M.

Author_Institution :

Sci. Inst., Univ. of Iceland, Reykjavik, Iceland

Volume :

Issue :

fYear :

2005

Firstpage :

628

Lastpage :

640

Abstract :

Two learning methods for acquiring position evaluation for small Go boards are studied and compared. In each case the function to be learned is a position-weighted piece counter and only the learning method differs. The methods studied are temporal difference learning (TDL) using the self-play gradient-descent method and coevolutionary learning, using an evolution strategy. The two approaches are compared with the hope of gaining a greater insight into the problem of searching for "optimal" zero-sum game strategies. Using tuned standard setups for each algorithm, it was found that the temporal-difference method learned faster, and in most cases also achieved a higher level of play than coevolution, providing that the gradient descent step size was chosen suitably. The performance of the coevolution method was found to be sensitive to the design of the evolutionary algorithm in several respects. Given the right configuration, however, coevolution achieved a higher level of play than TDL. Self-play results in optimal play against a copy of itself. A self-play player will prefer moves from which it is unlikely to lose even when it occasionally makes random exploratory moves. An evolutionary player forced to perform exploratory moves in the same way can achieve superior strategies to those acquired through self-play alone. The reason for this is that the evolutionary player is exposed to more varied game-play, because it plays against a diverse population of players.

Keywords :

evolutionary computation; game theory; gradient methods; learning (artificial intelligence); coevolutionary learning; evolution strategy; optimal zero sum game strategy; position evaluation; position weighted piece counter; self-play gradient descent method; small Go board; temporal difference learning; Algorithm design and analysis; Collaboration; Computational intelligence; Computer architecture; Counting circuits; Evolutionary computation; Genetics; Learning systems; Organisms; Testing; Coevolution; evolution strategy; game strategies; reinforcement learning; temporal difference learning;

fLanguage :

English

Journal_Title :

Evolutionary Computation, IEEE Transactions on

Publisher :

ieee

ISSN :

1089-778X

Type :

jour

DOI :

10.1109/TEVC.2005.856212

Filename :

1545939

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1244324