مرکز منطقه ای اطلاع رساني علوم و فناوري - Policy Gradient Semi-markov Decision Process

DocumentCode :

3336858

Title :

Policy Gradient Semi-markov Decision Process

Author :

Vien, Ngo Anh ; Chung, TaeChoong

Author_Institution :

Artificial Intell. Lab., Kyung Hee Univ., Yongin

Volume :

fYear :

2008

fDate :

3-5 Nov. 2008

Firstpage :

Lastpage :

Abstract :

This paper proposes a simulation-based algorithm for optimizing the average reward in a parameterized continuous-time, finite-state semi-Markov decision process (SMDP). Our contributions are twofold: First, we compute the approximate gradient of the average reward with respect to the parameters in SMDP controlled by parameterized stochastic policies. Then stochastic gradient ascent method is used to adjust the parameters in order to optimize the average reward. Second, we present a simulation-based algorithm to estimate the approximate average gradient of the average reward (GSMDP), using only single sample path of the underlying Markov chain. We prove the almost sure convergence of this estimate to the true gradient of the average reward when the number of iterations goes to infinity.

Keywords :

Markov processes; decision making; dynamic programming; problem solving; decision making; dynamic programming; semiMarkov decision process; simulation-based algorithm; stochastic gradient ascent method; Approximation algorithms; Artificial intelligence; Computational modeling; Convergence; Costs; Dynamic programming; Function approximation; Laboratories; Optimization methods; Stochastic processes;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Tools with Artificial Intelligence, 2008. ICTAI '08. 20th IEEE International Conference on

Conference_Location :

Dayton, OH

ISSN :

1082-3409

Print_ISBN :

978-0-7695-3440-4

Type :

conf

DOI :

10.1109/ICTAI.2008.51

Filename :

4669749

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3336858