Title :
Sample path-based policy-only learning by actor neural networks
Author_Institution :
Dept. of Ind. Eng. & Oper. Res., California Univ., Berkeley, CA, USA
Abstract :
This paper highlights a sample-path-based policy-only learning algorithm for regenerative (stochastic) processes proposed by Marbach and Tsitsiklis (1998). The algorithm attempts to optimize a randomized, parameterized policy according to the average cost criteria in conjunction with the so-called infinitesimal perturbation analysis gradient estimation technique. We present our numerical studies, demonstrating this learning algorithm using small-scale problems; in particular, the parametrized policy-only agent is a neural network function approximator in the spirit of neuro-dynamic programming (or reinforcement learning)
Keywords :
dynamic programming; function approximation; gradient methods; learning (artificial intelligence); neural nets; stochastic processes; actor neural networks; function approximation; gradient estimation; infinitesimal perturbation analysis; neural dynamic programming; policy-only learning algorithm; regenerative processes; reinforcement learning; Algorithm design and analysis; Artificial intelligence; Artificial neural networks; Computational modeling; Cost function; Industrial engineering; Large-scale systems; Learning; Neural networks; Operations research;
Conference_Titel :
Neural Networks, 1999. IJCNN '99. International Joint Conference on
Conference_Location :
Washington, DC
Print_ISBN :
0-7803-5529-6
DOI :
10.1109/IJCNN.1999.831139