DocumentCode
2028028
Title
Reinforcement learning with state-dependent discount factor
Author
Yoshida, Norihiro ; Uchibe, Eiji ; Doya, Kenji
Author_Institution
Nara Inst. of Sci. & Technol. (NAIST), Nara, Japan
fYear
2013
fDate
18-22 Aug. 2013
Firstpage
1
Lastpage
6
Abstract
Conventional reinforcement learning algorithms have several parameters which determine the feature of learning process, called meta-parameters. In this study, we focus on the discount factor that influences the time scale of the tradeoff between immediate and delayed rewards. The discount factor is usually considered as a constant value, but we introduce the state-dependent discount function and a new optimization criterion for the reinforcement learning algorithm. We first derive a new algorithm under the criterion, named ExQ-learning and we prove that the algorithm converges to the optimal action-value function in the meaning of new criterion w.p.1. We then present a framework to optimize the discount factor and the discount function by using an evolutionary algorithm. In order to validate the proposed method, we conduct a simple computer simulation and show that the proposed algorithm can find an appropriate state-dependent discount function with which performs better than that with a constant discount factor.
Keywords
learning (artificial intelligence); optimisation; ExQ-learning; computer simulation; constant discount factor; evolutionary algorithm; meta-parameters; optimal action-value function; optimization criterion; reinforcement learning algorithms; state-dependent discount factor; state-dependent discount function; Convergence; Equations; Green products; Learning (artificial intelligence); Linear programming; Mathematical model; Robots;
fLanguage
English
Publisher
ieee
Conference_Titel
Development and Learning and Epigenetic Robotics (ICDL), 2013 IEEE Third Joint International Conference on
Conference_Location
Osaka
Type
conf
DOI
10.1109/DevLrn.2013.6652533
Filename
6652533
Link To Document