Title :
An Othello evaluation function based on Temporal Difference Learning using probability of winning
Author :
Osaki, Yasuhiro ; Shibahara, Kazutomo ; Tajima, Yasuhiro ; Kotani, Yoshiyuki
Author_Institution :
Dept. of Comput. & Inf. Sci., Tokyo Univ. of Agric. & Technol., Koganei
Abstract :
This paper presents a new reinforcement learning method, called temporal difference learning with Monte Carlo simulation (TDMC), which uses a combination of Temporal Difference Learning (TD) and winning probability in each non-terminal position. Studies on self-teaching evaluation functions as applied to logic games have been conducted for many years, however few successful results of employing TD have been reported. This is perhaps due to the fact that the only reward observable in logic games is their final outcome, with no obvious rewards present in non-terminal positions. TDMC(lambda) attempts to compensate this problem by introducing winning probabilities, obtained through Monte Carlo simulation, as substitute rewards. Using Othello as a testing environment, TDMC(lambda), in comparison to TD(lambda), has been seen to yield better learning results.
Keywords :
Monte Carlo methods; computer games; learning (artificial intelligence); Monte Carlo simulation; Othello evaluation function; logic games; reinforcement learning method; self-teaching evaluation functions; temporal difference learning; winning probabilities; Agriculture; Computational modeling; Educational institutions; Learning systems; Logic; Optimization methods; Testing;
Conference_Titel :
Computational Intelligence and Games, 2008. CIG '08. IEEE Symposium On
Conference_Location :
Perth, WA
Print_ISBN :
978-1-4244-2973-8
Electronic_ISBN :
978-1-4244-2974-5
DOI :
10.1109/CIG.2008.5035641