• DocumentCode
    2575768
  • Title

    Pathologies of temporal difference methods in approximate dynamic programming

  • Author

    Bertsekas, Dimitri P.

  • Author_Institution
    Dept. of Electr. Eng. & Comp. Sci., M.I.T., Cambridge, MA, USA
  • fYear
    2010
  • fDate
    15-17 Dec. 2010
  • Firstpage
    3034
  • Lastpage
    3039
  • Abstract
    Approximate policy iteration methods based on temporal differences are popular in practice, and have been tested extensively, dating to the early nineties, but the associated convergence behavior is complex, and not well understood at present. An important question is whether the policy iteration process is seriously hampered by oscillations between poor policies, roughly similar to the attraction of gradient methods to poor local minima. There has been little apparent concern in the approximate DP/reinforcement learning literature about this possibility, even though it has been documented with several simple examples. Recent computational experimentation with the game of tetris, a popular testbed for approximate DP algorithms over a 15-year period, has brought the issue to sharp focus. In particular, using a standard set of 22 features and temporal difference methods, an average score of a few thousands was achieved. Using the same features and a random search method, an overwhelmingly better average score was achieved (600,000-900,000). The paper explains the likely mechanism of this phenomenon, and derives conditions under which it will not occur.
  • Keywords
    dynamic programming; iterative methods; random processes; search problems; approximate dynamic programming; approximate policy iteration method; random search method; temporal difference method; Aggregates; Convergence; Equations; Function approximation; Mathematical model; Oscillators;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Decision and Control (CDC), 2010 49th IEEE Conference on
  • Conference_Location
    Atlanta, GA
  • ISSN
    0743-1546
  • Print_ISBN
    978-1-4244-7745-6
  • Type

    conf

  • DOI
    10.1109/CDC.2010.5717644
  • Filename
    5717644