Title :
Convergence Results for Some Temporal Difference Methods Based on Least Squares
Author :
Yu, Huizhen ; Bertsekas, Dimitri P.
fDate :
7/1/2009 12:00:00 AM
Abstract :
We consider finite-state Markov decision processes, and prove convergence and rate of convergence results for certain least squares policy evaluation algorithms of the type known as LSPE(lambda ). These are temporal difference methods for constructing a linear function approximation of the cost function of a stationary policy, within the context of infinite-horizon discounted and average cost dynamic programming. We introduce an average cost method, patterned after the known discounted cost method, and we prove its convergence for a range of constant stepsize choices. We also show that the convergence rate of both the discounted and the average cost methods is optimal within the class of temporal difference methods. Analysis and experiment indicate that our methods are substantially and often dramatically faster than TD(lambda), as well as more reliable.
Keywords :
Markov processes; convergence of numerical methods; decision theory; difference equations; dynamic programming; function approximation; least squares approximations; LSPElambda; average cost method; convergence rate; cost function; discounted cost method; dynamic programming; finite-state Markov decision process; least squares policy evaluation algorithm; linear function approximation; temporal difference method; Approximation algorithms; Convergence; Cost function; Dynamic programming; Equations; Function approximation; Iterative algorithms; Laboratories; Least squares approximation; Least squares methods; Approximation methods; Markov processes; convergence of numerical methods; dynamic programming;
Journal_Title :
Automatic Control, IEEE Transactions on
DOI :
10.1109/TAC.2009.2022097