• DocumentCode
    1743644
  • Title

    Approximate value iteration with randomized policies

  • Author

    De Farias, D.P. ; Van Roy, B.

  • Author_Institution
    Stanford Univ., CA, USA
  • Volume
    4
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    3421
  • Abstract
    The curse of dimensionality in dynamic programming prevents, in most problems of practical interest, the exact computation of the value function. We study the fixed points of approximate value iteration, a simple algorithm that combats the curse of dimensionality by generating approximate iterates of the classical value iteration algorithm in the span of a set of prespecified basis functions. We show that, in general, the modified dynamic programming operator need not possess a fixed point, and therefore, approximate value iteration should not be expected to converge. However, by using a class of randomized policies, approximate value iteration is guaranteed to possess at least one fixed point. We finally discuss the link between approximate value iteration and temporal-difference learning (TD), and show that the existence of fixed points for approximate value iteration implies existence of stationary points for the ordinary differential equation approximated by a version of TD that incorporates “exploration”
  • Keywords
    Markov processes; dynamic programming; function approximation; iterative methods; learning (artificial intelligence); approximate value iteration; modified dynamic programming operator; randomized policies; temporal-difference learning; Control systems; Differential equations; Dynamic programming; State-space methods; Stochastic processes;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Decision and Control, 2000. Proceedings of the 39th IEEE Conference on
  • Conference_Location
    Sydney, NSW
  • ISSN
    0191-2216
  • Print_ISBN
    0-7803-6638-7
  • Type

    conf

  • DOI
    10.1109/CDC.2000.912232
  • Filename
    912232