DocumentCode
2030031
Title
Average cost temporal-difference learning
Author
Tsitsiklis, John N. ; Van Roy, Benjamin
Author_Institution
Lab. for Inf. & Decision Syst., MIT, Cambridge, MA, USA
Volume
1
fYear
1997
fDate
10-12 Dec 1997
Firstpage
498
Abstract
We describe a variant of temporal-difference learning that approximates average and differential costs of an irreducible aperiodic Markov chain. Approximations are comprised of linear combinations of fixed basis functions whose weights are incrementally updated during a single endless trajectory of the Markov chain. We present results concerning convergence and the limit of convergence. We also provide a bound on the resulting approximation error that exhibits an interesting dependence on the “mixing time” of the Markov chain. The results parallel previous work by the authors (1997), involving approximations of discounted cost-to-go
Keywords
Markov processes; convergence; decision theory; dynamic programming; learning (artificial intelligence); approximation error; average cost temporal-difference learning; convergence; discounted cost-to-go; fixed basis functions; irreducible aperiodic Markov chain; Approximation error; Convergence; Cost function; Dynamic programming; Heuristic algorithms; Iterative algorithms; Laboratories; Markov processes; Poisson equations; State-space methods;
fLanguage
English
Publisher
ieee
Conference_Titel
Decision and Control, 1997., Proceedings of the 36th IEEE Conference on
Conference_Location
San Diego, CA
ISSN
0191-2216
Print_ISBN
0-7803-4187-2
Type
conf
DOI
10.1109/CDC.1997.650675
Filename
650675
Link To Document