مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

3563673

Title :

Discounted UCB1-tuned for Q-learning

Author :

Saito, Koki ; Notsu, Akira ; Honda, Katsuhiro

Author_Institution :

Dept. of Comput. Sci. & Intell. Syst., Osaka Prefecture Univ., Sakai, Japan

fYear :

2014

Firstpage :

966

Lastpage :

970

Abstract :

Discounted UCB1-tuned was proposed as one of the methods to choose the action in a multi-armed bandit problem. This algorithm is an optimized selection method for balancing between the exploration and the exploitation, by using weighted value and weighted variance. In this paper, we proposed the method to apply Discounted UCB1-tuned to Q-learning, and experimentally evaluated its performance in the continuous state spaces shortest path problem.

Keywords :

estimation theory; learning (artificial intelligence); Q-learning; continuous state spaces shortest path problem; discounted UCB1-tuned; multi-armed bandit problem; Computer science; Computers; Damping; Learning (artificial intelligence); Shortest path problem; Standards; Vectors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Soft Computing and Intelligent Systems (SCIS), 2014 Joint 7th International Conference on and Advanced Intelligent Systems (ISIS), 15th International Symposium on

Type :

conf

DOI :

10.1109/SCIS-ISIS.2014.7044672

Filename :

7044672

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3563673