• DocumentCode
    116277
  • Title

    Empirical policy iteration for approximate dynamic programming

  • Author

    Haskell, William B. ; Jain, Rahul ; Kalathil, Dileep

  • Author_Institution
    Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
  • fYear
    2014
  • fDate
    15-17 Dec. 2014
  • Firstpage
    6573
  • Lastpage
    6578
  • Abstract
    We propose a simulation based algorithm, Empirical Policy Iteration (EPI) algorithm, for finding the optimal policy function of an MDP with infinite horizon discounted cost criteria when the transition kernels are unknown. Unlike simulation based algorithms using stochastic approximation techniques which give only asymptotic convergence results, we give provable, non-asymptotic performance guarantees in terms of sample complexity results: given ε > 0 and δ > 0 we specify the minimum number of simulation samples n(ε, δ) needed in each iteration and the minimum number of iterations k(ε, δ) that are sufficient for the EPI to yield, with a probability at least 1-δ, an approximate value function that is at least ε close to the optimal value function.
  • Keywords
    Markov processes; decision making; dynamic programming; EPI algorithm; MDP; Markov decision processes; approximate dynamic programming; empirical policy iteration; infinite horizon discounted cost criteria; nonasymptotic performance guarantees; optimal policy function; optimal value function; simulation based algorithm; transition kernels; Approximation algorithms; Approximation methods; Convergence; Dynamic programming; Heuristic algorithms; Markov processes;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Decision and Control (CDC), 2014 IEEE 53rd Annual Conference on
  • Conference_Location
    Los Angeles, CA
  • Print_ISBN
    978-1-4799-7746-8
  • Type

    conf

  • DOI
    10.1109/CDC.2014.7040420
  • Filename
    7040420