DocumentCode :
1809886
Title :
Sample path-based policy-only learning by actor neural networks
Author :
Mizutani, Eiji
Author_Institution :
Dept. of Ind. Eng. & Oper. Res., California Univ., Berkeley, CA, USA
Volume :
2
fYear :
1999
fDate :
36342
Firstpage :
1245
Abstract :
This paper highlights a sample-path-based policy-only learning algorithm for regenerative (stochastic) processes proposed by Marbach and Tsitsiklis (1998). The algorithm attempts to optimize a randomized, parameterized policy according to the average cost criteria in conjunction with the so-called infinitesimal perturbation analysis gradient estimation technique. We present our numerical studies, demonstrating this learning algorithm using small-scale problems; in particular, the parametrized policy-only agent is a neural network function approximator in the spirit of neuro-dynamic programming (or reinforcement learning)
Keywords :
dynamic programming; function approximation; gradient methods; learning (artificial intelligence); neural nets; stochastic processes; actor neural networks; function approximation; gradient estimation; infinitesimal perturbation analysis; neural dynamic programming; policy-only learning algorithm; regenerative processes; reinforcement learning; Algorithm design and analysis; Artificial intelligence; Artificial neural networks; Computational modeling; Cost function; Industrial engineering; Large-scale systems; Learning; Neural networks; Operations research;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks, 1999. IJCNN '99. International Joint Conference on
Conference_Location :
Washington, DC
ISSN :
1098-7576
Print_ISBN :
0-7803-5529-6
Type :
conf
DOI :
10.1109/IJCNN.1999.831139
Filename :
831139
Link To Document :
بازگشت