• DocumentCode
    2030031
  • Title

    Average cost temporal-difference learning

  • Author

    Tsitsiklis, John N. ; Van Roy, Benjamin

  • Author_Institution
    Lab. for Inf. & Decision Syst., MIT, Cambridge, MA, USA
  • Volume
    1
  • fYear
    1997
  • fDate
    10-12 Dec 1997
  • Firstpage
    498
  • Abstract
    We describe a variant of temporal-difference learning that approximates average and differential costs of an irreducible aperiodic Markov chain. Approximations are comprised of linear combinations of fixed basis functions whose weights are incrementally updated during a single endless trajectory of the Markov chain. We present results concerning convergence and the limit of convergence. We also provide a bound on the resulting approximation error that exhibits an interesting dependence on the “mixing time” of the Markov chain. The results parallel previous work by the authors (1997), involving approximations of discounted cost-to-go
  • Keywords
    Markov processes; convergence; decision theory; dynamic programming; learning (artificial intelligence); approximation error; average cost temporal-difference learning; convergence; discounted cost-to-go; fixed basis functions; irreducible aperiodic Markov chain; Approximation error; Convergence; Cost function; Dynamic programming; Heuristic algorithms; Iterative algorithms; Laboratories; Markov processes; Poisson equations; State-space methods;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Decision and Control, 1997., Proceedings of the 36th IEEE Conference on
  • Conference_Location
    San Diego, CA
  • ISSN
    0191-2216
  • Print_ISBN
    0-7803-4187-2
  • Type

    conf

  • DOI
    10.1109/CDC.1997.650675
  • Filename
    650675