Title :
Value-gradient learning
Author :
Fairbank, Michael ; Alonso, Eduardo
Author_Institution :
Dept. of Comput., City Univ. London, London, UK
Abstract :
We describe an Adaptive Dynamic Programming algorithm VGL(λ) for learning a critic function over a large continuous state space. The algorithm, which requires a learned model of the environment, extends Dual Heuristic Dynamic Programming to include a bootstrapping parameter analogous to that used in the reinforcement learning algorithm TD(λ). We provide on-line and batch mode implementations of the algorithm, and summarise the theoretical relationships and motivations of using this method over its precursor algorithms Dual Heuristic Dynamic Programming and TD(λ). Experiments for control problems using a neural network and greedy policy are provided.
Keywords :
dynamic programming; learning (artificial intelligence); adaptive dynamic programming; bootstrapping parameter; control problems; critic function; dual heuristic dynamic programming; greedy policy; large continuous state space; neural network; reinforcement learning; value-gradient learning; Approximation algorithms; Dynamic programming; Equations; Heuristic algorithms; Mathematical model; Trajectory; Vectors; Adaptive Dynamic Programming; DHP; Dual Heuristic Dynamic Programming; Value-Gradient Learning;
Conference_Titel :
Neural Networks (IJCNN), The 2012 International Joint Conference on
Conference_Location :
Brisbane, QLD
Print_ISBN :
978-1-4673-1488-6
Electronic_ISBN :
2161-4393
DOI :
10.1109/IJCNN.2012.6252791