• DocumentCode
    810295
  • Title

    A basic formula for online policy gradient algorithms

  • Author

    Cao, Xi-Ren

  • Author_Institution
    Hong Kong Univ. of Sci. & Technol., Kowloon, China
  • Volume
    50
  • Issue
    5
  • fYear
    2005
  • fDate
    5/1/2005 12:00:00 AM
  • Firstpage
    696
  • Lastpage
    699
  • Abstract
    This note presents a (new) basic formula for sample-path-based estimates for performance gradients for Markov systems (called policy gradients in reinforcement learning literature). With this basic formula, many policy-gradient algorithms, including those that have previously appeared in the literature, can be easily developed. The formula follows naturally from a sensitivity equation in perturbation analysis. New research direction is discussed.
  • Keywords
    Markov processes; control system analysis; gradient methods; learning (artificial intelligence); perturbation techniques; stochastic systems; Markov system; online policy gradient algorithm; perturbation analysis; reinforcement learning; sample-path-based estimates; Algorithm design and analysis; Approximation algorithms; Learning; Markov processes; Optimization; Performance analysis; Poisson equations; Steady-state; Stochastic systems; Terminology; Markov decision processes; Poisson equations; online estimation; perturbation analysis (PA); perturbation realization; potentials; reinforcement learning;
  • fLanguage
    English
  • Journal_Title
    Automatic Control, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9286
  • Type

    jour

  • DOI
    10.1109/TAC.2005.847037
  • Filename
    1431053