DocumentCode
116277
Title
Empirical policy iteration for approximate dynamic programming
Author
Haskell, William B. ; Jain, Rahul ; Kalathil, Dileep
Author_Institution
Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
fYear
2014
fDate
15-17 Dec. 2014
Firstpage
6573
Lastpage
6578
Abstract
We propose a simulation based algorithm, Empirical Policy Iteration (EPI) algorithm, for finding the optimal policy function of an MDP with infinite horizon discounted cost criteria when the transition kernels are unknown. Unlike simulation based algorithms using stochastic approximation techniques which give only asymptotic convergence results, we give provable, non-asymptotic performance guarantees in terms of sample complexity results: given ε > 0 and δ > 0 we specify the minimum number of simulation samples n(ε, δ) needed in each iteration and the minimum number of iterations k(ε, δ) that are sufficient for the EPI to yield, with a probability at least 1-δ, an approximate value function that is at least ε close to the optimal value function.
Keywords
Markov processes; decision making; dynamic programming; EPI algorithm; MDP; Markov decision processes; approximate dynamic programming; empirical policy iteration; infinite horizon discounted cost criteria; nonasymptotic performance guarantees; optimal policy function; optimal value function; simulation based algorithm; transition kernels; Approximation algorithms; Approximation methods; Convergence; Dynamic programming; Heuristic algorithms; Markov processes;
fLanguage
English
Publisher
ieee
Conference_Titel
Decision and Control (CDC), 2014 IEEE 53rd Annual Conference on
Conference_Location
Los Angeles, CA
Print_ISBN
978-1-4799-7746-8
Type
conf
DOI
10.1109/CDC.2014.7040420
Filename
7040420
Link To Document