DocumentCode :
3613838
Title :
On the mean-square rate of convergence of temporal-difference learning algorithms
Author :
V.B. Tadic
Author_Institution :
Dept. of Electr. & Electron. Eng., Melbourne Univ., Parkville, Vic., Australia
Volume :
2
fYear :
2002
fDate :
6/24/1905 12:00:00 AM
Firstpage :
1454
Abstract :
In this paper, the mean-square rate of convergence of temporal-difference learning algorithms is analyzed. The analysis is carried out for the case of discounted cost function associated with a Markov chain with a finite dimensional state-space. Under mild conditions, it is shown that these algorithms converge at the rate O(n/sup -1/2/). The results are illustrated with examples related to random coefficient autoregression models and M/G/1 queues.
Keywords :
"Convergence","Cost function","Function approximation","Algorithm design and analysis","Automatic control","Approximation error","Stochastic processes","Predictive models","Performance analysis","Australia Council"
Publisher :
ieee
Conference_Titel :
American Control Conference, 2002. Proceedings of the 2002
ISSN :
0743-1619
Print_ISBN :
0-7803-7298-0
Type :
conf
DOI :
10.1109/ACC.2002.1023226
Filename :
1023226
Link To Document :
بازگشت