مرکز منطقه ای اطلاع رساني علوم و فناوري - Temporal Difference Methods for General Projected Equations

DocumentCode :

1450655

Title :

Temporal Difference Methods for General Projected Equations

Author :

Bertsekas, Dimitri P.

Author_Institution :

Lab. for Inf. & Decision Syst. (LIDS), Massachusetts Inst. of Technol., Cambridge, MA, USA

Volume :

Issue :

fYear :

2011

Firstpage :

2128

Lastpage :

2139

Abstract :

We consider projected equations for approximate solution of high-dimensional fixed point problems within low-dimensional subspaces. We introduce an analytical framework based on an equivalence with variational inequalities, and algorithms that may be implemented with low-dimensional simulation. These algorithms originated in approximate dynamic programming (DP), where they are collectively known as temporal difference (TD) methods. Even when specialized to DP, our methods include extensions/new versions of TD methods, which offer special implementation advantages and reduced overhead over the standard LSTD and LSPE methods, and can deal with near singularity in the associated matrix inversion. We develop deterministic iterative methods and their simulation-based versions, and we discuss a sharp qualitative distinction between them: the performance of the former is greatly affected by direction and feature scaling, yet the latter have the same asymptotic convergence rate regardless of scaling, because of their common simulation-induced performance bottleneck.

Keywords :

convergence; difference equations; dynamic programming; iterative methods; variational techniques; asymptotic convergence rate; direction scaling; dynamic programming; feature scaling; general projected equations; high-dimensional fixed point problems; iterative methods; low-dimensional subspaces; matrix inversion; simulation-based versions; temporal difference methods; Convergence; Equations; Iterative methods; Least squares approximation; Mathematical model; Symmetric matrices; Approximation methods; Markov decision processes; dynamic programming; reinforcement learning; temporal difference methods;

fLanguage :

English

Journal_Title :

Automatic Control, IEEE Transactions on

Publisher :

ieee

ISSN :

0018-9286

Type :

jour

DOI :

10.1109/TAC.2011.2115290

Filename :

5713814

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1450655