مرکز منطقه ای اطلاع رساني علوم و فناوري - Continuous-time Markov decision process with average reward: Using reinforcement learning method

DocumentCode :

2250603

Title :

Continuous-time Markov decision process with average reward: Using reinforcement learning method

Author :

Jia, Shengde ; Shen, Lincheng ; Xue, Hongtao

Author_Institution :

College of Mechantronic Engineering and Automation, National University of Defense Technology, Changsha, Hunan 410073, P.R. China

fYear :

2015

fDate :

28-30 July 2015

Firstpage :

3097

Lastpage :

3100

Abstract :

Markov decision process (MDP) is a foundational framework of reinforcement learning advanced in sequential decision problems. Continuous-time Markov decision process (CTMDP) extends the discrete time MDP model by allowing actions to take place at any time. Prior work has little consideration on the reinforcement learning methods for solving CTMDPs. The aim of our article was to present a reinforcement learning approach based on the path of samples. For the key concept of performance potential function, a policy iteration algorithm with average reward was presented. Then, through the Robbins-Monro method, a temporal difference formula for evaluating the performance potential function was also proposed. Simulation results indicated that the presented algorithms could converge to the solution of the CTMDP problem at a proper speed.

Keywords :

Learning (artificial intelligence); Markov processes; Mathematical model; Poisson equations; Process control; Random variables; Steady-state; Continuous-time Markov Decision Process; Reinforcement Learning; performance potential function;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Control Conference (CCC), 2015 34th Chinese

Conference_Location :

Hangzhou, China

Type :

conf

DOI :

10.1109/ChiCC.2015.7260117

Filename :

7260117

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2250603