Title :
Totally model-free reinforcement learning by actor-critic Elman networks in non-Markovian domains
Author :
Mizutani, Eiji ; Dreyfus, Stuart E.
Author_Institution :
Dept. of Ind. Eng. & Oper. Res., California Univ., Berkeley, CA, USA
Abstract :
We describe how an actor-critic reinforcement learning agent in a non-Markovian domain finds an optimal sequence of actions in a totally model-free fashion; that is, the agent neither learns transitional probabilities and associated rewards, nor by how much the state space should be augmented so that the Markov property holds. In particular, we employ an Elman-type recurrent neural network to solve non-Markovian problems since an Elman-type network is able to implicitly and automatically render the process Markovian. A standard “actor-critic” neural network model has two separate components: the action (actor) network and the value (critic) network. In animal brains, however, those two presumably may not be distinct, but rather somehow entwined. We thus construct one Elman network with two output nodes: actor node and critic node, and a portion of the shared hidden layer is fed back as the context layer, which functions as a history memory to produce sensitivity to non-Markovian dependencies. The agent explores small-scale three and four-stage triangular path-networks to learn an optimal sequence of actions that maximizes total value (or reward) associated with its transition from vertex to vertex. The posed problem has deterministic transition and reward associated with each allowable action (although either could be stochastic) and is rendered non-Markovian by the reward being dependent on an earlier transition. Due to the nature of neural model-free learning, the agent needs many iterations to find the optimal actions even in small-scale path problems
Keywords :
learning (artificial intelligence); recurrent neural nets; Elman-type recurrent neural network; action network; actor-critic Elman networks; actor-critic reinforcement learning agent; history memory; neural model-free learning; nonMarkovian domains; small-scale path problems; totally model-free reinforcement learning; triangular path-networks; value network; Animals; Biological neural networks; History; Industrial engineering; Intelligent networks; Learning; Neural networks; Operations research; State-space methods; Stochastic processes;
Conference_Titel :
Neural Networks Proceedings, 1998. IEEE World Congress on Computational Intelligence. The 1998 IEEE International Joint Conference on
Conference_Location :
Anchorage, AK
Print_ISBN :
0-7803-4859-1
DOI :
10.1109/IJCNN.1998.687169