Reinforcement learning with state-dependent discount factor

Author

Yoshida, Norihiro ; Uchibe, Eiji ; Doya, Kenji

Author_Institution

Nara Inst. of Sci. & Technol. (NAIST), Nara, Japan

fYear

2013

fDate

18-22 Aug. 2013

Firstpage

1

Lastpage

6

Abstract

Conventional reinforcement learning algorithms have several parameters which determine the feature of learning process, called meta-parameters. In this study, we focus on the discount factor that influences the time scale of the tradeoff between immediate and delayed rewards. The discount factor is usually considered as a constant value, but we introduce the state-dependent discount function and a new optimization criterion for the reinforcement learning algorithm. We first derive a new algorithm under the criterion, named ExQ-learning and we prove that the algorithm converges to the optimal action-value function in the meaning of new criterion w.p.1. We then present a framework to optimize the discount factor and the discount function by using an evolutionary algorithm. In order to validate the proposed method, we conduct a simple computer simulation and show that the proposed algorithm can find an appropriate state-dependent discount function with which performs better than that with a constant discount factor.

Keywords

learning (artificial intelligence); optimisation; ExQ-learning; computer simulation; constant discount factor; evolutionary algorithm; meta-parameters; optimal action-value function; optimization criterion; reinforcement learning algorithms; state-dependent discount factor; state-dependent discount function; Convergence; Equations; Green products; Learning (artificial intelligence); Linear programming; Mathematical model; Robots;

fLanguage

English

Publisher

ieee

Conference_Titel

Development and Learning and Epigenetic Robotics (ICDL), 2013 IEEE Third Joint International Conference on

Conference_Location

Osaka

Type

conf

DOI

10.1109/DevLrn.2013.6652533

Filename

6652533