مرکز منطقه ای اطلاع رساني علوم و فناوري - Q-learning and enhanced policy iteration in discounted dynamic programming

DocumentCode :

2580412

Title :

Q-learning and enhanced policy iteration in discounted dynamic programming

Author :

Bertsekas, Dimitri P. ; Yu, Huizhen

Author_Institution :

Dept. of Electr. Eng. & Comp. Sci., M.I.T., Cambridge, MA, USA

fYear :

2010

fDate :

15-17 Dec. 2010

Firstpage :

1409

Lastpage :

1416

Abstract :

We consider the classical finite-state discounted Markovian decision problem, and we introduce a new policy iteration-like algorithm for finding the optimal Q-factors. Instead of policy evaluation by solving a linear system of equations, our algorithm involves (possibly inexact) solution of an optimal stopping problem. This problem can be solved with simple Q-learning iterations, in the case where a lookup table representation is used; it can also be solved with the Q-learning algorithm of Tsitsiklis and Van Roy [TsV99], in the case where feature-based Q-factor approximations are used. In exact/lookup table representation form, our algorithm admits asynchronous and stochastic iterative implementations, in the spirit of asynchronous/modified policy iteration, with lower overhead advantages over existing Q-learning schemes. Furthermore, for large-scale problems, where linear basis function approximations and simulation-based temporal difference implementations are used, our algorithm resolves effectively the inherent difficulties of existing schemes due to inadequate exploration.

Keywords :

Markov processes; Q-factor; dynamic programming; function approximation; iterative methods; learning systems; table lookup; Q-learning; dynamic programming; feature-based Q-factor approximations; finite-state discounted Markovian decision problem; large-scale problems; linear basis function approximations; lookup table representation; optimal Q-factors; optimal stopping problem; policy iteration; simulation-based temporal difference; stochastic iterative implementations; Approximation algorithms; Approximation methods; Context; Convergence; Equations; Minimization; Table lookup;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Decision and Control (CDC), 2010 49th IEEE Conference on

Conference_Location :

Atlanta, GA

ISSN :

0743-1546

Print_ISBN :

978-1-4244-7745-6

Type :

conf

DOI :

10.1109/CDC.2010.5717930

Filename :

5717930

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2580412