DocumentCode :
2165699
Title :
Reinforcement learning algorithms for semi-Markov decision processes with average reward
Author :
Li, Yanjie
Author_Institution :
Shenzhen Grad. Sch., Harbin Inst. of Technol., Shenzhen, China
fYear :
2012
fDate :
11-14 April 2012
Firstpage :
157
Lastpage :
162
Abstract :
In this paper, we study reinforcement learning (RL) algorithms based on a perspective of performance sensitivity analysis for SMDPs with average reward. We present the results about performance sensitivity analysis for SMDPs with average reward. On these bases, two RL algorithms for average-reward SMDPs are studied. One algorithm is the relative value iteration (RVI) RL algorithm, which avoids the estimation of optimal average reward in the process of learning. Another algorithm is a policy gradient estimation algorithm, which extends the policy gradient estimation algorithm for discrete time Markov decision processes (MDPs) to SMDPs and only requires half storage of the existing algorithm.
Keywords :
Markov processes; decision making; gradient methods; learning (artificial intelligence); RL algorithm; RVI; average-reward SMDP; discrete time Markov decision processes; performance sensitivity analysis; policy gradient estimation algorithm; reinforcement learning algorithms; relative value iteration; semi-Markov decision processes; sequential decision-making problems; Algorithm design and analysis; Approximation algorithms; Equations; Estimation; Markov processes; Q factor; Sensitivity analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Networking, Sensing and Control (ICNSC), 2012 9th IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4673-0388-0
Type :
conf
DOI :
10.1109/ICNSC.2012.6204909
Filename :
6204909
Link To Document :
بازگشت