مرکز منطقه ای اطلاع رساني علوم و فناوري - Statistically linearized least-squares temporal differences

DocumentCode :

1866921

Title :

Statistically linearized least-squares temporal differences

Author :

Geist, Matthieu ; Pietquin, Olivier

Author_Institution :

IMS Res. Group, Supelec, Metz, France

fYear :

2010

fDate :

18-20 Oct. 2010

Firstpage :

450

Lastpage :

457

Abstract :

A common drawback of standard reinforcement learning algorithms is their inability to scale-up to real-world problems. For this reason, a current important trend of research is (state-action) value function approximation. A prominent value function approximator is the least-squares temporal differences (LSTD) algorithm. However, for technical reasons, linearity is mandatory: the parameterization of the value function must be linear (compact nonlinear representations are not allowed) and only the Bellman evaluation operator can be considered (imposing policy-iteration-like schemes). In this paper, this restriction of LSTD is lifted thanks to a derivative-free statistical linearization approach. This way, nonlinear parameterizations and the Bellman optimality operator can be taken into account (this last point allows taking into account value-iteration-like schemes). The efficiency of the resulting algorithms are demonstrated using a linear parametrization and neural networks as well as on a Q-learning-like problem. A theoretical analysis is also provided.

Keywords :

function approximation; learning (artificial intelligence); least squares approximations; neural nets; statistical analysis; Bellman evaluation operator; LSTD algorithm; Q-learning-like problem; derivative-free statistical linearization; linear parametrization; neural networks; nonlinear parameterization; reinforcement learning algorithm; statistically linearized least-squares temporal difference algorithm; value function approximator; Approximation algorithms; Artificial neural networks; Function approximation; Noise; Optimization; Transforms; neural networks; reinforcement learning; statistical linearization; value function approximation;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), 2010 International Congress on

Conference_Location :

Moscow

ISSN :

2157-0221

Print_ISBN :

978-1-4244-7285-7

Type :

conf

DOI :

10.1109/ICUMT.2010.5676598

Filename :

5676598

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1866921