مرکز منطقه ای اطلاع رساني علوم و فناوري - Performance Investigation of UCB Policy in Q-learning

DocumentCode :

3756873

Title :

Performance Investigation of UCB Policy in Q-learning

Author :

Koki Saito;Akira Notsu;Seiki Ubukata;Katsuhiro Honda

Author_Institution :

Dept. of Comput. Sci. &

fYear :

2015

Firstpage :

777

Lastpage :

780

Abstract :

In this paper, we investigated performance and usability of UCBQ algorithm proposed in previous research. This is the algorithm that UCB, which is one of bandit algorithms, is applied to Q-Learning, and can balance between exploitation and exploration. We confirmed in the previous research that it was able to realize effective learning in a partially observable Markov decision process by using a continuous state spaces shortest path problem. We numerically examined it by using a variety of simpler learning situation which is the 2 dimensional goal search problem in a Markov decision process, comparing to previous methods. As a result, we confirmed that it had a better performance than other methods.

Keywords :

"Usability","Markov processes","Search problems","Damping","Shortest path problem","Learning (artificial intelligence)","Upper bound"

Publisher :

ieee

Conference_Titel :

Machine Learning and Applications (ICMLA), 2015 IEEE 14th International Conference on

Type :

conf

DOI :

10.1109/ICMLA.2015.59

Filename :

7424416

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3756873