DocumentCode :
3756873
Title :
Performance Investigation of UCB Policy in Q-learning
Author :
Koki Saito;Akira Notsu;Seiki Ubukata;Katsuhiro Honda
Author_Institution :
Dept. of Comput. Sci. &
fYear :
2015
Firstpage :
777
Lastpage :
780
Abstract :
In this paper, we investigated performance and usability of UCBQ algorithm proposed in previous research. This is the algorithm that UCB, which is one of bandit algorithms, is applied to Q-Learning, and can balance between exploitation and exploration. We confirmed in the previous research that it was able to realize effective learning in a partially observable Markov decision process by using a continuous state spaces shortest path problem. We numerically examined it by using a variety of simpler learning situation which is the 2 dimensional goal search problem in a Markov decision process, comparing to previous methods. As a result, we confirmed that it had a better performance than other methods.
Keywords :
"Usability","Markov processes","Search problems","Damping","Shortest path problem","Learning (artificial intelligence)","Upper bound"
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications (ICMLA), 2015 IEEE 14th International Conference on
Type :
conf
DOI :
10.1109/ICMLA.2015.59
Filename :
7424416
Link To Document :
بازگشت