Title :
Convergence of Model-Based Temporal Difference Learning for Control
Author :
Van Hasselt, Hado ; Wiering, Marco A.
Author_Institution :
Dept. of Inf. & Comput. Sci., Utrecht Univ.
Abstract :
A theoretical analysis of model-based temporal difference learning for control is given, leading to a proof of convergence. This work differs from earlier work on the convergence of temporal difference learning by proving convergence to the optimal value function. This means that not the values of the current policy are found, but instead the policy is updated in such a manner that ultimately the optimal policy is guaranteed to be reached
Keywords :
convergence; learning (artificial intelligence); optimal control; optimal value function; proof of convergence; temporal difference learning; Convergence; Dynamic programming; Intelligent systems; Learning; Stochastic processes; Telephony;
Conference_Titel :
Approximate Dynamic Programming and Reinforcement Learning, 2007. ADPRL 2007. IEEE International Symposium on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0706-0
DOI :
10.1109/ADPRL.2007.368170