مرکز منطقه ای اطلاع رساني علوم و فناوري - Reinforcement learning algorithms for semi-Markov decision processes with average reward

DocumentCode :

2165699

Title :

Reinforcement learning algorithms for semi-Markov decision processes with average reward

Author :

Li, Yanjie

Author_Institution :

Shenzhen Grad. Sch., Harbin Inst. of Technol., Shenzhen, China

fYear :

2012

fDate :

11-14 April 2012

Firstpage :

157

Lastpage :

162

Abstract :

In this paper, we study reinforcement learning (RL) algorithms based on a perspective of performance sensitivity analysis for SMDPs with average reward. We present the results about performance sensitivity analysis for SMDPs with average reward. On these bases, two RL algorithms for average-reward SMDPs are studied. One algorithm is the relative value iteration (RVI) RL algorithm, which avoids the estimation of optimal average reward in the process of learning. Another algorithm is a policy gradient estimation algorithm, which extends the policy gradient estimation algorithm for discrete time Markov decision processes (MDPs) to SMDPs and only requires half storage of the existing algorithm.

Keywords :

Markov processes; decision making; gradient methods; learning (artificial intelligence); RL algorithm; RVI; average-reward SMDP; discrete time Markov decision processes; performance sensitivity analysis; policy gradient estimation algorithm; reinforcement learning algorithms; relative value iteration; semi-Markov decision processes; sequential decision-making problems; Algorithm design and analysis; Approximation algorithms; Equations; Estimation; Markov processes; Q factor; Sensitivity analysis;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Networking, Sensing and Control (ICNSC), 2012 9th IEEE International Conference on

Conference_Location :

Beijing

Print_ISBN :

978-1-4673-0388-0

Type :

conf

DOI :

10.1109/ICNSC.2012.6204909

Filename :

6204909

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2165699