DocumentCode
1743644
Title
Approximate value iteration with randomized policies
Author
De Farias, D.P. ; Van Roy, B.
Author_Institution
Stanford Univ., CA, USA
Volume
4
fYear
2000
fDate
2000
Firstpage
3421
Abstract
The curse of dimensionality in dynamic programming prevents, in most problems of practical interest, the exact computation of the value function. We study the fixed points of approximate value iteration, a simple algorithm that combats the curse of dimensionality by generating approximate iterates of the classical value iteration algorithm in the span of a set of prespecified basis functions. We show that, in general, the modified dynamic programming operator need not possess a fixed point, and therefore, approximate value iteration should not be expected to converge. However, by using a class of randomized policies, approximate value iteration is guaranteed to possess at least one fixed point. We finally discuss the link between approximate value iteration and temporal-difference learning (TD), and show that the existence of fixed points for approximate value iteration implies existence of stationary points for the ordinary differential equation approximated by a version of TD that incorporates “exploration”
Keywords
Markov processes; dynamic programming; function approximation; iterative methods; learning (artificial intelligence); approximate value iteration; modified dynamic programming operator; randomized policies; temporal-difference learning; Control systems; Differential equations; Dynamic programming; State-space methods; Stochastic processes;
fLanguage
English
Publisher
ieee
Conference_Titel
Decision and Control, 2000. Proceedings of the 39th IEEE Conference on
Conference_Location
Sydney, NSW
ISSN
0191-2216
Print_ISBN
0-7803-6638-7
Type
conf
DOI
10.1109/CDC.2000.912232
Filename
912232
Link To Document