• DocumentCode
    1538378
  • Title

    An analysis of temporal-difference learning with function approximation

  • Author

    Tsitsiklis, John N. ; Van Roy, Benjamin

  • Author_Institution
    Lab. for Inf. & Decision Syst., MIT, Cambridge, MA, USA
  • Volume
    42
  • Issue
    5
  • fYear
    1997
  • fDate
    5/1/1997 12:00:00 AM
  • Firstpage
    674
  • Lastpage
    690
  • Abstract
    We discuss the temporal-difference learning algorithm, as applied to approximating the cost-to-go function of an infinite-horizon discounted Markov chain. The algorithm we analyze updates parameters of a linear function approximator online during a single endless trajectory of an irreducible aperiodic Markov chain with a finite or infinite state space. We present a proof of convergence (with probability one), a characterization of the limit of convergence, and a bound on the resulting approximation error. Furthermore, our analysis is based on a new line of reasoning that provides new intuition about the dynamics of temporal-difference learning. In addition to proving new and stronger positive results than those previously available, we identify the significance of online updating and potential hazards associated with the use of nonlinear function approximators. First, we prove that divergence may occur when updates are not based on trajectories of the Markov chain. This fact reconciles positive and negative results that have been discussed in the literature, regarding the soundness of temporal-difference learning. Second, we present an example illustrating the possibility of divergence when temporal difference learning is used in the presence of a nonlinear function approximator
  • Keywords
    Markov processes; convergence; function approximation; learning (artificial intelligence); convergence; cost-to-go function; finite state space; infinite state space; infinite-horizon discounted Markov chain; irreducible aperiodic Markov chain; linear function approximation; nonlinear function approximators; temporal-difference learning; Algorithm design and analysis; Approximation algorithms; Approximation error; Convergence; Cost function; Dynamic programming; Error analysis; Function approximation; Linear approximation; State-space methods;
  • fLanguage
    English
  • Journal_Title
    Automatic Control, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9286
  • Type

    jour

  • DOI
    10.1109/9.580874
  • Filename
    580874