مرکز منطقه ای اطلاع رساني علوم و فناوري - The divergence of reinforcement learning algorithms with value-iteration and function approximation

DocumentCode :

2777605

Title :

The divergence of reinforcement learning algorithms with value-iteration and function approximation

Author :

Fairbank, Michael ; Alonso, Eduardo

Author_Institution :

Dept. of Comput., City Univ. London, London, UK

fYear :

2012

fDate :

10-15 June 2012

Firstpage :

Lastpage :

Abstract :

This paper gives specific divergence examples of value-iteration for several major Reinforcement Learning and Adaptive Dynamic Programming algorithms, when using a function approximator for the value function. These divergence examples differ from previous divergence examples in the literature, in that they are applicable for a greedy policy, i.e. in a “value iteration” scenario. Perhaps surprisingly, with a greedy policy, it is also possible to get divergence for the algorithms TD(1) and Sarsa(1). In addition to these divergences, we also achieve divergence for the Adaptive Dynamic Programming algorithms HDP, DHP and GDHP.

Keywords :

dynamic programming; function approximation; learning (artificial intelligence); GDHP; HDP; Sarsa algorithm; TD algorithm; adaptive dynamic programming algorithm; function approximation; greedy policy; reinforcement learning; value function; value-iteration; Approximation algorithms; Equations; Function approximation; Heuristic algorithms; Trajectory; Vectors; Adaptive Dynamic Programming; Divergence; Greedy Policy; Reinforcement Learning; Value Iteration;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Neural Networks (IJCNN), The 2012 International Joint Conference on

Conference_Location :

Brisbane, QLD

ISSN :

2161-4393

Print_ISBN :

978-1-4673-1488-6

Electronic_ISBN :

2161-4393

Type :

conf

DOI :

10.1109/IJCNN.2012.6252792

Filename :

6252792

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2777605