• DocumentCode
    3400185
  • Title

    Algorithms for variance reduction in a policy-gradient based actor-critic framework

  • Author

    Awate, Yogesh P.

  • Author_Institution
    marketRx - A Cognizant Co., Gurgaon
  • fYear
    2009
  • fDate
    March 30 2009-April 2 2009
  • Firstpage
    130
  • Lastpage
    136
  • Abstract
    We consider the framework of a set of recently proposed two-timescale actor-critic algorithms for reinforcement-learning (RL) using the long-run average-reward criterion and linear feature-based value-function approximation. The actor and critic updates are based on stochastic policy-gradient ascent and temporal-difference algorithms, respectively. Unlike conventional RL algorithms, policy-gradient-based algorithms guarantee convergence even with value-function approximation but suffer due to high variance of the policy-gradient estimator. To minimize this variance for an existing algorithm, we derive a stochastic-gradient-based novel critic update. We propose a novel baseline structure for variance minimization of an estimator and derive an optimal baseline which makes the covariance matrix a zero matrix - the best achievable. We derive a novel actor update based on the optimal baseline deduced for an existing algorithm. We derive another novel actor update using the optimal baseline for an unbiased policy-gradient estimator which we deduce from the policy-gradient theorem with function approximation. We obtain a novel variance-minimization-based interpretation for an existing algorithm. The computational results demonstrate that the proposed algorithms outperform the state-of-the-art on Garnet problems.
  • Keywords
    covariance matrices; function approximation; gradient methods; learning (artificial intelligence); stochastic processes; Garnet problems; covariance matrix; long-run average-reward criterion; reinforcement-learning; stochastic policy-gradient ascent; temporal-difference algorithms; two-timescale actor-critic algorithms; value-function approximation; variance reduction; Approximation algorithms; Convergence; Covariance matrix; Function approximation; Garnets; Learning; Linear approximation; State-space methods; Stochastic processes; Table lookup;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Adaptive Dynamic Programming and Reinforcement Learning, 2009. ADPRL '09. IEEE Symposium on
  • Conference_Location
    Nashville, TN
  • Print_ISBN
    978-1-4244-2761-1
  • Type

    conf

  • DOI
    10.1109/ADPRL.2009.4927536
  • Filename
    4927536