Value-gradient learning

Author

Fairbank, Michael ; Alonso, Eduardo

Author_Institution

Dept. of Comput., City Univ. London, London, UK

fYear

2012

fDate

10-15 June 2012

Firstpage

1

Lastpage

8

Abstract

We describe an Adaptive Dynamic Programming algorithm VGL(λ) for learning a critic function over a large continuous state space. The algorithm, which requires a learned model of the environment, extends Dual Heuristic Dynamic Programming to include a bootstrapping parameter analogous to that used in the reinforcement learning algorithm TD(λ). We provide on-line and batch mode implementations of the algorithm, and summarise the theoretical relationships and motivations of using this method over its precursor algorithms Dual Heuristic Dynamic Programming and TD(λ). Experiments for control problems using a neural network and greedy policy are provided.

Keywords

dynamic programming; learning (artificial intelligence); adaptive dynamic programming; bootstrapping parameter; control problems; critic function; dual heuristic dynamic programming; greedy policy; large continuous state space; neural network; reinforcement learning; value-gradient learning; Approximation algorithms; Dynamic programming; Equations; Heuristic algorithms; Mathematical model; Trajectory; Vectors; Adaptive Dynamic Programming; DHP; Dual Heuristic Dynamic Programming; Value-Gradient Learning;

fLanguage

English

Publisher

ieee

Conference_Titel

Neural Networks (IJCNN), The 2012 International Joint Conference on

Conference_Location

Brisbane, QLD

ISSN

2161-4393

Print_ISBN

978-1-4673-1488-6

Electronic_ISBN

2161-4393

Type

conf

DOI

10.1109/IJCNN.2012.6252791

Filename

6252791