Infinite-horizon gradient estimation for semi-Markov decision processes

Author

Li, Yanjie ; Cao, Fang

Author_Institution

Shenzhen Grad. Sch., Harbin Inst. of Technol., Shenzhen, China

fYear

2011

fDate

15-18 May 2011

Firstpage

926

Lastpage

931

Abstract

This paper presents a performance gradient formula for semi-Markov decision processes with average reward criterion. With this formula, we propose an infinite-horizon online (sample-path based) gradient estimation algorithm. This algorithm naturally extend online gradient estimation algorithm for discrete-time Markov systems to continuous time semi-Markov models. In particular, the new algorithm requires less storage than the algorithm appeared in the literature.

Keywords

Markov processes; continuous time systems; decision theory; discrete time systems; gradient methods; average reward criterion; continuous time semiMarkov models; discrete time Markov systems; infinite horizon online gradient estimation algorithm; semiMarkov decision processes; Algorithm design and analysis; Approximation algorithms; Approximation methods; Equations; Estimation; Markov processes; Optimization;

fLanguage

English

Publisher

ieee

Conference_Titel

Control Conference (ASCC), 2011 8th Asian

Conference_Location

Kaohsiung

Print_ISBN

978-1-61284-487-9

Electronic_ISBN

978-89-956056-4-6

Type

conf

Filename

5899196

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=1805450