DocumentCode :
810295
Title :
A basic formula for online policy gradient algorithms
Author :
Cao, Xi-Ren
Author_Institution :
Hong Kong Univ. of Sci. & Technol., Kowloon, China
Volume :
50
Issue :
5
fYear :
2005
fDate :
5/1/2005 12:00:00 AM
Firstpage :
696
Lastpage :
699
Abstract :
This note presents a (new) basic formula for sample-path-based estimates for performance gradients for Markov systems (called policy gradients in reinforcement learning literature). With this basic formula, many policy-gradient algorithms, including those that have previously appeared in the literature, can be easily developed. The formula follows naturally from a sensitivity equation in perturbation analysis. New research direction is discussed.
Keywords :
Markov processes; control system analysis; gradient methods; learning (artificial intelligence); perturbation techniques; stochastic systems; Markov system; online policy gradient algorithm; perturbation analysis; reinforcement learning; sample-path-based estimates; Algorithm design and analysis; Approximation algorithms; Learning; Markov processes; Optimization; Performance analysis; Poisson equations; Steady-state; Stochastic systems; Terminology; Markov decision processes; Poisson equations; online estimation; perturbation analysis (PA); perturbation realization; potentials; reinforcement learning;
fLanguage :
English
Journal_Title :
Automatic Control, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9286
Type :
jour
DOI :
10.1109/TAC.2005.847037
Filename :
1431053
Link To Document :
بازگشت