مرکز منطقه ای اطلاع رساني علوم و فناوري - Sample path-based policy-only learning by actor neural networks

DocumentCode :

1809886

Title :

Sample path-based policy-only learning by actor neural networks

Author :

Mizutani, Eiji

Author_Institution :

Dept. of Ind. Eng. & Oper. Res., California Univ., Berkeley, CA, USA

Volume :

fYear :

1999

fDate :

36342

Firstpage :

1245

Abstract :

This paper highlights a sample-path-based policy-only learning algorithm for regenerative (stochastic) processes proposed by Marbach and Tsitsiklis (1998). The algorithm attempts to optimize a randomized, parameterized policy according to the average cost criteria in conjunction with the so-called infinitesimal perturbation analysis gradient estimation technique. We present our numerical studies, demonstrating this learning algorithm using small-scale problems; in particular, the parametrized policy-only agent is a neural network function approximator in the spirit of neuro-dynamic programming (or reinforcement learning)

Keywords :

dynamic programming; function approximation; gradient methods; learning (artificial intelligence); neural nets; stochastic processes; actor neural networks; function approximation; gradient estimation; infinitesimal perturbation analysis; neural dynamic programming; policy-only learning algorithm; regenerative processes; reinforcement learning; Algorithm design and analysis; Artificial intelligence; Artificial neural networks; Computational modeling; Cost function; Industrial engineering; Large-scale systems; Learning; Neural networks; Operations research;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Neural Networks, 1999. IJCNN '99. International Joint Conference on

Conference_Location :

Washington, DC

ISSN :

1098-7576

Print_ISBN :

0-7803-5529-6

Type :

conf

DOI :

10.1109/IJCNN.1999.831139

Filename :

831139

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1809886