مرکز منطقه ای اطلاع رساني علوم و فناوري - Second order fluctuations of TD(λ) and a positive real condition

DocumentCode :

2575928

Title :

Second order fluctuations of TD(λ) and a positive real condition

Author :

Solo, Victor

Author_Institution :

Sch. of Electr. Eng. & Telecommun., Univ. of New South Wales, Sydney, NSW, Australia

fYear :

2010

fDate :

15-17 Dec. 2010

Firstpage :

2849

Lastpage :

2854

Abstract :

We analyze the behaviour of a generalised TD(λ) algorithm with constant step size. We first consider linear estimation of the optimal cost. By using realisation-wise averaging analysis we prove for the first time, boundedness under a positive real condition. We also provide for the first time, a detailed analysis of second order fluctuations of a TD(λ) type algorithm. We then consider nonlinear estimation of the optimal cost.

Keywords :

cost optimal control; learning (artificial intelligence); nonlinear estimation; TD algorithm; linear estimation; nonlinear estimation; optimal cost; realisation wise averaging analysis; second order fluctuation; temporal difference learning algorithm; Adaptive algorithms; Algorithm design and analysis; Convergence; Dynamic programming; Equations; Estimation; Heuristic algorithms;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Decision and Control (CDC), 2010 49th IEEE Conference on

Conference_Location :

Atlanta, GA

ISSN :

0743-1546

Print_ISBN :

978-1-4244-7745-6

Type :

conf

DOI :

10.1109/CDC.2010.5717655

Filename :

5717655

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2575928