مرکز منطقه ای اطلاع رساني علوم و فناوري - Imitating play from game trajectories: Temporal difference learning versus preference learning

DocumentCode :

579598

Title :

Imitating play from game trajectories: Temporal difference learning versus preference learning

Author :

Runarsson, Thomas Philip ; Lucas, Simon M.

fYear :

2012

fDate :

11-14 Sept. 2012

Firstpage :

Lastpage :

Abstract :

This work compares the learning of linear evaluation functions using preference learning versus least squares temporal difference learning, LSTD(λ), from samples of game trajectories. The game trajectories are taken from human competitions held by the French Othello Federation¹. The raw board positions are used to create a linear evaluation function to illustrate the key difference between the two learning approaches. The results show that the policies learned, using exactly the same game trajectories, can be quite different. For the simple set of features used, preference learning produces policies that better capture the behaviour of expert players, and also lead to higher levels of play when compared to LSTD(λ).

Keywords :

games of skill; learning (artificial intelligence); least squares approximations; French Othello Federation; LSTD; board position; expert player behaviour; game trajectory; human competition; learning approach; least squares temporal difference learning; linear evaluation function; play imitation; policy learning; preference learning; Games; Humans; Learning systems; Machine learning; Radiation detectors; Trajectory; Vectors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computational Intelligence and Games (CIG), 2012 IEEE Conference on

Conference_Location :

Granada

Print_ISBN :

978-1-4673-1193-9

Electronic_ISBN :

978-1-4673-1192-2

Type :

conf

DOI :

10.1109/CIG.2012.6374141

Filename :

6374141

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=579598