Title :
Imitating play from game trajectories: Temporal difference learning versus preference learning
Author :
Runarsson, Thomas Philip ; Lucas, Simon M.
Abstract :
This work compares the learning of linear evaluation functions using preference learning versus least squares temporal difference learning, LSTD(λ), from samples of game trajectories. The game trajectories are taken from human competitions held by the French Othello Federation1. The raw board positions are used to create a linear evaluation function to illustrate the key difference between the two learning approaches. The results show that the policies learned, using exactly the same game trajectories, can be quite different. For the simple set of features used, preference learning produces policies that better capture the behaviour of expert players, and also lead to higher levels of play when compared to LSTD(λ).
Keywords :
games of skill; learning (artificial intelligence); least squares approximations; French Othello Federation; LSTD; board position; expert player behaviour; game trajectory; human competition; learning approach; least squares temporal difference learning; linear evaluation function; play imitation; policy learning; preference learning; Games; Humans; Learning systems; Machine learning; Radiation detectors; Trajectory; Vectors;
Conference_Titel :
Computational Intelligence and Games (CIG), 2012 IEEE Conference on
Conference_Location :
Granada
Print_ISBN :
978-1-4673-1193-9
Electronic_ISBN :
978-1-4673-1192-2
DOI :
10.1109/CIG.2012.6374141