DocumentCode :
2831817
Title :
Learning optimal values from random walk
Author :
Lam, K.P.
Author_Institution :
Dept of Syst. Eng. & Eng. Manage., Hong Kong Chinese Univ.
fYear :
2005
fDate :
16-16 Nov. 2005
Lastpage :
339
Abstract :
In this paper we extend the random walk example of Sutton and Barto (1998) to a multistage dynamic programming optimization setting with discounted reward. Using Bellman equations on presumed action, the optimal values are derived for general transition probability rho and discount rate gamma, and include the original random walk as a special case. Temporal difference methods with eligibility traces, TD(A), are effective in predicting the optimal values for different rho and gamma; but their performances are found to depend critically on the choice of truncated return in the formulation when gamma is less than 1
Keywords :
dynamic programming; learning (artificial intelligence); random processes; Bellman equations; discounted reward; eligibility traces; general transition probability; multistage dynamic programming optimization; optimal value learning; random walk; temporal difference methods; Artificial intelligence; Drugs; Dynamic programming; Equations; Hardware; Learning; Neurodynamics; Research and development management; Systems engineering and theory; Very large scale integration;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Tools with Artificial Intelligence, 2005. ICTAI 05. 17th IEEE International Conference on
Conference_Location :
Hong Kong
ISSN :
1082-3409
Print_ISBN :
0-7695-2488-5
Type :
conf
DOI :
10.1109/ICTAI.2005.81
Filename :
1562957
Link To Document :
بازگشت