Title :
Second order fluctuations of TD(λ) and a positive real condition
Author_Institution :
Sch. of Electr. Eng. & Telecommun., Univ. of New South Wales, Sydney, NSW, Australia
Abstract :
We analyze the behaviour of a generalised TD(λ) algorithm with constant step size. We first consider linear estimation of the optimal cost. By using realisation-wise averaging analysis we prove for the first time, boundedness under a positive real condition. We also provide for the first time, a detailed analysis of second order fluctuations of a TD(λ) type algorithm. We then consider nonlinear estimation of the optimal cost.
Keywords :
cost optimal control; learning (artificial intelligence); nonlinear estimation; TD algorithm; linear estimation; nonlinear estimation; optimal cost; realisation wise averaging analysis; second order fluctuation; temporal difference learning algorithm; Adaptive algorithms; Algorithm design and analysis; Convergence; Dynamic programming; Equations; Estimation; Heuristic algorithms;
Conference_Titel :
Decision and Control (CDC), 2010 49th IEEE Conference on
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4244-7745-6
DOI :
10.1109/CDC.2010.5717655