Empirical policy iteration for approximate dynamic programming

Author

Haskell, William B. ; Jain, Rahul ; Kalathil, Dileep

Author_Institution

Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA

fYear

2014

fDate

15-17 Dec. 2014

Firstpage

6573

Lastpage

6578

Abstract

We propose a simulation based algorithm, Empirical Policy Iteration (EPI) algorithm, for finding the optimal policy function of an MDP with infinite horizon discounted cost criteria when the transition kernels are unknown. Unlike simulation based algorithms using stochastic approximation techniques which give only asymptotic convergence results, we give provable, non-asymptotic performance guarantees in terms of sample complexity results: given ε > 0 and δ > 0 we specify the minimum number of simulation samples n(ε, δ) needed in each iteration and the minimum number of iterations k(ε, δ) that are sufficient for the EPI to yield, with a probability at least 1-δ, an approximate value function that is at least ε close to the optimal value function.

Keywords

Markov processes; decision making; dynamic programming; EPI algorithm; MDP; Markov decision processes; approximate dynamic programming; empirical policy iteration; infinite horizon discounted cost criteria; nonasymptotic performance guarantees; optimal policy function; optimal value function; simulation based algorithm; transition kernels; Approximation algorithms; Approximation methods; Convergence; Dynamic programming; Heuristic algorithms; Markov processes;

fLanguage

English

Publisher

ieee

Conference_Titel

Decision and Control (CDC), 2014 IEEE 53rd Annual Conference on

Conference_Location

Los Angeles, CA

Print_ISBN

978-1-4799-7746-8

Type

conf

DOI

10.1109/CDC.2014.7040420

Filename

7040420