DocumentCode :
2777594
Title :
Value-gradient learning
Author :
Fairbank, Michael ; Alonso, Eduardo
Author_Institution :
Dept. of Comput., City Univ. London, London, UK
fYear :
2012
fDate :
10-15 June 2012
Firstpage :
1
Lastpage :
8
Abstract :
We describe an Adaptive Dynamic Programming algorithm VGL(λ) for learning a critic function over a large continuous state space. The algorithm, which requires a learned model of the environment, extends Dual Heuristic Dynamic Programming to include a bootstrapping parameter analogous to that used in the reinforcement learning algorithm TD(λ). We provide on-line and batch mode implementations of the algorithm, and summarise the theoretical relationships and motivations of using this method over its precursor algorithms Dual Heuristic Dynamic Programming and TD(λ). Experiments for control problems using a neural network and greedy policy are provided.
Keywords :
dynamic programming; learning (artificial intelligence); adaptive dynamic programming; bootstrapping parameter; control problems; critic function; dual heuristic dynamic programming; greedy policy; large continuous state space; neural network; reinforcement learning; value-gradient learning; Approximation algorithms; Dynamic programming; Equations; Heuristic algorithms; Mathematical model; Trajectory; Vectors; Adaptive Dynamic Programming; DHP; Dual Heuristic Dynamic Programming; Value-Gradient Learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks (IJCNN), The 2012 International Joint Conference on
Conference_Location :
Brisbane, QLD
ISSN :
2161-4393
Print_ISBN :
978-1-4673-1488-6
Electronic_ISBN :
2161-4393
Type :
conf
DOI :
10.1109/IJCNN.2012.6252791
Filename :
6252791
Link To Document :
بازگشت