DocumentCode :
2575928
Title :
Second order fluctuations of TD(λ) and a positive real condition
Author :
Solo, Victor
Author_Institution :
Sch. of Electr. Eng. & Telecommun., Univ. of New South Wales, Sydney, NSW, Australia
fYear :
2010
fDate :
15-17 Dec. 2010
Firstpage :
2849
Lastpage :
2854
Abstract :
We analyze the behaviour of a generalised TD(λ) algorithm with constant step size. We first consider linear estimation of the optimal cost. By using realisation-wise averaging analysis we prove for the first time, boundedness under a positive real condition. We also provide for the first time, a detailed analysis of second order fluctuations of a TD(λ) type algorithm. We then consider nonlinear estimation of the optimal cost.
Keywords :
cost optimal control; learning (artificial intelligence); nonlinear estimation; TD algorithm; linear estimation; nonlinear estimation; optimal cost; realisation wise averaging analysis; second order fluctuation; temporal difference learning algorithm; Adaptive algorithms; Algorithm design and analysis; Convergence; Dynamic programming; Equations; Estimation; Heuristic algorithms;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Decision and Control (CDC), 2010 49th IEEE Conference on
Conference_Location :
Atlanta, GA
ISSN :
0743-1546
Print_ISBN :
978-1-4244-7745-6
Type :
conf
DOI :
10.1109/CDC.2010.5717655
Filename :
5717655
Link To Document :
بازگشت