Title :
A basic formula for online policy gradient algorithms
Author_Institution :
Hong Kong Univ. of Sci. & Technol., Kowloon, China
fDate :
5/1/2005 12:00:00 AM
Abstract :
This note presents a (new) basic formula for sample-path-based estimates for performance gradients for Markov systems (called policy gradients in reinforcement learning literature). With this basic formula, many policy-gradient algorithms, including those that have previously appeared in the literature, can be easily developed. The formula follows naturally from a sensitivity equation in perturbation analysis. New research direction is discussed.
Keywords :
Markov processes; control system analysis; gradient methods; learning (artificial intelligence); perturbation techniques; stochastic systems; Markov system; online policy gradient algorithm; perturbation analysis; reinforcement learning; sample-path-based estimates; Algorithm design and analysis; Approximation algorithms; Learning; Markov processes; Optimization; Performance analysis; Poisson equations; Steady-state; Stochastic systems; Terminology; Markov decision processes; Poisson equations; online estimation; perturbation analysis (PA); perturbation realization; potentials; reinforcement learning;
Journal_Title :
Automatic Control, IEEE Transactions on
DOI :
10.1109/TAC.2005.847037