Reinforcement learning in the game of Othello: Learning against a fixed opponent and learning from self-play

Author

van der Ree, Michiel ; Wiering, Marco

Author_Institution

Inst. of Artificial Intell. & Cognitive Eng., Univ. of Groningen, Groningen, Netherlands

fYear

2013

fDate

16-19 April 2013

Firstpage

108

Lastpage

115

Abstract

This paper compares three strategies in using reinforcement learning algorithms to let an artificial agent learn to play the game of Othello. The three strategies that are compared are: Learning by self-play, learning from playing against a fixed opponent, and learning from playing against a fixed opponent while learning from the opponent´s moves as well. These issues are considered for the algorithms Q-learning, Sarsa and TD-learning. These three reinforcement learning algorithms are combined with multi-layer perceptrons and trained and tested against three fixed opponents. It is found that the best strategy of learning differs per algorithm. Q-learning and Sarsa perform best when trained against the fixed opponent they are also tested against, whereas TD-learning performs best when trained through self-play. Surprisingly, Q-learning and Sarsa outperform TD-learning against the stronger fixed opponents, when all methods use their best strategy. Learning from the opponent´s moves as well leads to worse results compared to learning only from the learning agent´s own moves.

Keywords

computer games; game theory; learning (artificial intelligence); multi-agent systems; multilayer perceptrons; Othello game; Q-learning algorithm; Sarsa algorithm; TD-learning algorithm; artificial agent; multilayer perceptrons; reinforcement learning algorithms; self-play learning; Artificial neural networks; Games; Heuristic algorithms; Learning (artificial intelligence); Testing; Training;

fLanguage

English

Publisher

ieee

Conference_Titel

Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2013 IEEE Symposium on

Conference_Location

Singapore

ISSN

2325-1824

Type

conf

DOI

10.1109/ADPRL.2013.6614996

Filename

6614996