• DocumentCode
    3743854
  • Title

    An empirical algorithm for relative value iteration for average-cost MDPs

  • Author

    Abhishek Gupta;Rahul Jain;Peter W. Glynn

  • Author_Institution
    USC Ming Hsieh Department of Electrical Engineering at University of Southern California, Los Angeles, United States of America
  • fYear
    2015
  • Firstpage
    5079
  • Lastpage
    5084
  • Abstract
    Infinite-horizon average-cost Markov decision processes are of interest in many scenarios. A dynamic programming algorithm, called the relative value iteration, is used to compute the optimal value function. For large state spaces, this often runs into difficulties due to its computational burden. We propose a simulation-based dynamic program called empirical relative value iteration (ERVI). The idea is very simple: replace the expectation in the Bellman operator with a sample average estimate, and then use projection to ensure boundedness of the iterates. We establish that the ERVI iterates converge to the optimal value function in the span-seminorm in probability as the number of samples taken goes to infinity. Simulation results show remarkably good performance even with a small number of samples.
  • Keywords
    "Heuristic algorithms","Convergence","Markov processes","Approximation algorithms","Dynamic programming","Decision making","Computational complexity"
  • Publisher
    ieee
  • Conference_Titel
    Decision and Control (CDC), 2015 IEEE 54th Annual Conference on
  • Type

    conf

  • DOI
    10.1109/CDC.2015.7403014
  • Filename
    7403014