Title :
Reinforcement learning algorithms for semi-Markov decision processes with average reward
Author_Institution :
Shenzhen Grad. Sch., Harbin Inst. of Technol., Shenzhen, China
Abstract :
In this paper, we study reinforcement learning (RL) algorithms based on a perspective of performance sensitivity analysis for SMDPs with average reward. We present the results about performance sensitivity analysis for SMDPs with average reward. On these bases, two RL algorithms for average-reward SMDPs are studied. One algorithm is the relative value iteration (RVI) RL algorithm, which avoids the estimation of optimal average reward in the process of learning. Another algorithm is a policy gradient estimation algorithm, which extends the policy gradient estimation algorithm for discrete time Markov decision processes (MDPs) to SMDPs and only requires half storage of the existing algorithm.
Keywords :
Markov processes; decision making; gradient methods; learning (artificial intelligence); RL algorithm; RVI; average-reward SMDP; discrete time Markov decision processes; performance sensitivity analysis; policy gradient estimation algorithm; reinforcement learning algorithms; relative value iteration; semi-Markov decision processes; sequential decision-making problems; Algorithm design and analysis; Approximation algorithms; Equations; Estimation; Markov processes; Q factor; Sensitivity analysis;
Conference_Titel :
Networking, Sensing and Control (ICNSC), 2012 9th IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4673-0388-0
DOI :
10.1109/ICNSC.2012.6204909