Title :
Learning optimal values from random walk
Author_Institution :
Dept of Syst. Eng. & Eng. Manage., Hong Kong Chinese Univ.
Abstract :
In this paper we extend the random walk example of Sutton and Barto (1998) to a multistage dynamic programming optimization setting with discounted reward. Using Bellman equations on presumed action, the optimal values are derived for general transition probability rho and discount rate gamma, and include the original random walk as a special case. Temporal difference methods with eligibility traces, TD(A), are effective in predicting the optimal values for different rho and gamma; but their performances are found to depend critically on the choice of truncated return in the formulation when gamma is less than 1
Keywords :
dynamic programming; learning (artificial intelligence); random processes; Bellman equations; discounted reward; eligibility traces; general transition probability; multistage dynamic programming optimization; optimal value learning; random walk; temporal difference methods; Artificial intelligence; Drugs; Dynamic programming; Equations; Hardware; Learning; Neurodynamics; Research and development management; Systems engineering and theory; Very large scale integration;
Conference_Titel :
Tools with Artificial Intelligence, 2005. ICTAI 05. 17th IEEE International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
0-7695-2488-5
DOI :
10.1109/ICTAI.2005.81