• DocumentCode
    3269575
  • Title

    Reinforcement learning in the game of Othello: Learning against a fixed opponent and learning from self-play

  • Author

    van der Ree, Michiel ; Wiering, Marco

  • Author_Institution
    Inst. of Artificial Intell. & Cognitive Eng., Univ. of Groningen, Groningen, Netherlands
  • fYear
    2013
  • fDate
    16-19 April 2013
  • Firstpage
    108
  • Lastpage
    115
  • Abstract
    This paper compares three strategies in using reinforcement learning algorithms to let an artificial agent learn to play the game of Othello. The three strategies that are compared are: Learning by self-play, learning from playing against a fixed opponent, and learning from playing against a fixed opponent while learning from the opponent´s moves as well. These issues are considered for the algorithms Q-learning, Sarsa and TD-learning. These three reinforcement learning algorithms are combined with multi-layer perceptrons and trained and tested against three fixed opponents. It is found that the best strategy of learning differs per algorithm. Q-learning and Sarsa perform best when trained against the fixed opponent they are also tested against, whereas TD-learning performs best when trained through self-play. Surprisingly, Q-learning and Sarsa outperform TD-learning against the stronger fixed opponents, when all methods use their best strategy. Learning from the opponent´s moves as well leads to worse results compared to learning only from the learning agent´s own moves.
  • Keywords
    computer games; game theory; learning (artificial intelligence); multi-agent systems; multilayer perceptrons; Othello game; Q-learning algorithm; Sarsa algorithm; TD-learning algorithm; artificial agent; multilayer perceptrons; reinforcement learning algorithms; self-play learning; Artificial neural networks; Games; Heuristic algorithms; Learning (artificial intelligence); Testing; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2013 IEEE Symposium on
  • Conference_Location
    Singapore
  • ISSN
    2325-1824
  • Type

    conf

  • DOI
    10.1109/ADPRL.2013.6614996
  • Filename
    6614996