مرکز منطقه ای اطلاع رساني علوم و فناوري - Algorithms for variance reduction in a policy-gradient based actor-critic framework

DocumentCode :

3400185

Title :

Algorithms for variance reduction in a policy-gradient based actor-critic framework

Author :

Awate, Yogesh P.

Author_Institution :

marketRx - A Cognizant Co., Gurgaon

fYear :

2009

fDate :

March 30 2009-April 2 2009

Firstpage :

130

Lastpage :

136

Abstract :

We consider the framework of a set of recently proposed two-timescale actor-critic algorithms for reinforcement-learning (RL) using the long-run average-reward criterion and linear feature-based value-function approximation. The actor and critic updates are based on stochastic policy-gradient ascent and temporal-difference algorithms, respectively. Unlike conventional RL algorithms, policy-gradient-based algorithms guarantee convergence even with value-function approximation but suffer due to high variance of the policy-gradient estimator. To minimize this variance for an existing algorithm, we derive a stochastic-gradient-based novel critic update. We propose a novel baseline structure for variance minimization of an estimator and derive an optimal baseline which makes the covariance matrix a zero matrix - the best achievable. We derive a novel actor update based on the optimal baseline deduced for an existing algorithm. We derive another novel actor update using the optimal baseline for an unbiased policy-gradient estimator which we deduce from the policy-gradient theorem with function approximation. We obtain a novel variance-minimization-based interpretation for an existing algorithm. The computational results demonstrate that the proposed algorithms outperform the state-of-the-art on Garnet problems.

Keywords :

covariance matrices; function approximation; gradient methods; learning (artificial intelligence); stochastic processes; Garnet problems; covariance matrix; long-run average-reward criterion; reinforcement-learning; stochastic policy-gradient ascent; temporal-difference algorithms; two-timescale actor-critic algorithms; value-function approximation; variance reduction; Approximation algorithms; Convergence; Covariance matrix; Function approximation; Garnets; Learning; Linear approximation; State-space methods; Stochastic processes; Table lookup;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Adaptive Dynamic Programming and Reinforcement Learning, 2009. ADPRL '09. IEEE Symposium on

Conference_Location :

Nashville, TN

Print_ISBN :

978-1-4244-2761-1

Type :

conf

DOI :

10.1109/ADPRL.2009.4927536

Filename :

4927536

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3400185