مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

2831817

Title :

Learning optimal values from random walk

Author :

Lam, K.P.

Author_Institution :

Dept of Syst. Eng. & Eng. Manage., Hong Kong Chinese Univ.

fYear :

2005

fDate :

16-16 Nov. 2005

Lastpage :

339

Abstract :

In this paper we extend the random walk example of Sutton and Barto (1998) to a multistage dynamic programming optimization setting with discounted reward. Using Bellman equations on presumed action, the optimal values are derived for general transition probability rho and discount rate gamma, and include the original random walk as a special case. Temporal difference methods with eligibility traces, TD(A), are effective in predicting the optimal values for different rho and gamma; but their performances are found to depend critically on the choice of truncated return in the formulation when gamma is less than 1

Keywords :

dynamic programming; learning (artificial intelligence); random processes; Bellman equations; discounted reward; eligibility traces; general transition probability; multistage dynamic programming optimization; optimal value learning; random walk; temporal difference methods; Artificial intelligence; Drugs; Dynamic programming; Equations; Hardware; Learning; Neurodynamics; Research and development management; Systems engineering and theory; Very large scale integration;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Tools with Artificial Intelligence, 2005. ICTAI 05. 17th IEEE International Conference on

Conference_Location :

Hong Kong

ISSN :

1082-3409

Print_ISBN :

0-7695-2488-5

Type :

conf

DOI :

10.1109/ICTAI.2005.81

Filename :

1562957

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2831817