مرکز منطقه ای اطلاع رساني علوم و فناوري - Convergence of Model-Based Temporal Difference Learning for Control

DocumentCode :

2717173

Title :

Convergence of Model-Based Temporal Difference Learning for Control

Author :

Van Hasselt, Hado ; Wiering, Marco A.

Author_Institution :

Dept. of Inf. & Comput. Sci., Utrecht Univ.

fYear :

2007

fDate :

1-5 April 2007

Firstpage :

Lastpage :

Abstract :

A theoretical analysis of model-based temporal difference learning for control is given, leading to a proof of convergence. This work differs from earlier work on the convergence of temporal difference learning by proving convergence to the optimal value function. This means that not the values of the current policy are found, but instead the policy is updated in such a manner that ultimately the optimal policy is guaranteed to be reached

Keywords :

convergence; learning (artificial intelligence); optimal control; optimal value function; proof of convergence; temporal difference learning; Convergence; Dynamic programming; Intelligent systems; Learning; Stochastic processes; Telephony;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Approximate Dynamic Programming and Reinforcement Learning, 2007. ADPRL 2007. IEEE International Symposium on

Conference_Location :

Honolulu, HI

Print_ISBN :

1-4244-0706-0

Type :

conf

DOI :

10.1109/ADPRL.2007.368170

Filename :

4220815

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2717173