DocumentCode :
1840814
Title :
An Othello evaluation function based on Temporal Difference Learning using probability of winning
Author :
Osaki, Yasuhiro ; Shibahara, Kazutomo ; Tajima, Yasuhiro ; Kotani, Yoshiyuki
Author_Institution :
Dept. of Comput. & Inf. Sci., Tokyo Univ. of Agric. & Technol., Koganei
fYear :
2008
fDate :
15-18 Dec. 2008
Firstpage :
205
Lastpage :
211
Abstract :
This paper presents a new reinforcement learning method, called temporal difference learning with Monte Carlo simulation (TDMC), which uses a combination of Temporal Difference Learning (TD) and winning probability in each non-terminal position. Studies on self-teaching evaluation functions as applied to logic games have been conducted for many years, however few successful results of employing TD have been reported. This is perhaps due to the fact that the only reward observable in logic games is their final outcome, with no obvious rewards present in non-terminal positions. TDMC(lambda) attempts to compensate this problem by introducing winning probabilities, obtained through Monte Carlo simulation, as substitute rewards. Using Othello as a testing environment, TDMC(lambda), in comparison to TD(lambda), has been seen to yield better learning results.
Keywords :
Monte Carlo methods; computer games; learning (artificial intelligence); Monte Carlo simulation; Othello evaluation function; logic games; reinforcement learning method; self-teaching evaluation functions; temporal difference learning; winning probabilities; Agriculture; Computational modeling; Educational institutions; Learning systems; Logic; Optimization methods; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence and Games, 2008. CIG '08. IEEE Symposium On
Conference_Location :
Perth, WA
Print_ISBN :
978-1-4244-2973-8
Electronic_ISBN :
978-1-4244-2974-5
Type :
conf
DOI :
10.1109/CIG.2008.5035641
Filename :
5035641
Link To Document :
بازگشت