مرکز منطقه ای اطلاع رساني علوم و فناوري - A basic formula for online policy gradient algorithms

DocumentCode :

810295

Title :

A basic formula for online policy gradient algorithms

Author :

Cao, Xi-Ren

Author_Institution :

Hong Kong Univ. of Sci. & Technol., Kowloon, China

Volume :

Issue :

fYear :

2005

fDate :

5/1/2005 12:00:00 AM

Firstpage :

696

Lastpage :

699

Abstract :

This note presents a (new) basic formula for sample-path-based estimates for performance gradients for Markov systems (called policy gradients in reinforcement learning literature). With this basic formula, many policy-gradient algorithms, including those that have previously appeared in the literature, can be easily developed. The formula follows naturally from a sensitivity equation in perturbation analysis. New research direction is discussed.

Keywords :

Markov processes; control system analysis; gradient methods; learning (artificial intelligence); perturbation techniques; stochastic systems; Markov system; online policy gradient algorithm; perturbation analysis; reinforcement learning; sample-path-based estimates; Algorithm design and analysis; Approximation algorithms; Learning; Markov processes; Optimization; Performance analysis; Poisson equations; Steady-state; Stochastic systems; Terminology; Markov decision processes; Poisson equations; online estimation; perturbation analysis (PA); perturbation realization; potentials; reinforcement learning;

fLanguage :

English

Journal_Title :

Automatic Control, IEEE Transactions on

Publisher :

ieee

ISSN :

0018-9286

Type :

jour

DOI :

10.1109/TAC.2005.847037

Filename :

1431053

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=810295