مرکز منطقه ای اطلاع رساني علوم و فناوري - Totally model-free reinforcement learning by actor-critic Elman networks in non-Markovian domains

DocumentCode :

2858373

Title :

Totally model-free reinforcement learning by actor-critic Elman networks in non-Markovian domains

Author :

Mizutani, Eiji ; Dreyfus, Stuart E.

Author_Institution :

Dept. of Ind. Eng. & Oper. Res., California Univ., Berkeley, CA, USA

Volume :

fYear :

1998

fDate :

4-9 May 1998

Firstpage :

2016

Abstract :

We describe how an actor-critic reinforcement learning agent in a non-Markovian domain finds an optimal sequence of actions in a totally model-free fashion; that is, the agent neither learns transitional probabilities and associated rewards, nor by how much the state space should be augmented so that the Markov property holds. In particular, we employ an Elman-type recurrent neural network to solve non-Markovian problems since an Elman-type network is able to implicitly and automatically render the process Markovian. A standard “actor-critic” neural network model has two separate components: the action (actor) network and the value (critic) network. In animal brains, however, those two presumably may not be distinct, but rather somehow entwined. We thus construct one Elman network with two output nodes: actor node and critic node, and a portion of the shared hidden layer is fed back as the context layer, which functions as a history memory to produce sensitivity to non-Markovian dependencies. The agent explores small-scale three and four-stage triangular path-networks to learn an optimal sequence of actions that maximizes total value (or reward) associated with its transition from vertex to vertex. The posed problem has deterministic transition and reward associated with each allowable action (although either could be stochastic) and is rendered non-Markovian by the reward being dependent on an earlier transition. Due to the nature of neural model-free learning, the agent needs many iterations to find the optimal actions even in small-scale path problems

Keywords :

learning (artificial intelligence); recurrent neural nets; Elman-type recurrent neural network; action network; actor-critic Elman networks; actor-critic reinforcement learning agent; history memory; neural model-free learning; nonMarkovian domains; small-scale path problems; totally model-free reinforcement learning; triangular path-networks; value network; Animals; Biological neural networks; History; Industrial engineering; Intelligent networks; Learning; Neural networks; Operations research; State-space methods; Stochastic processes;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Neural Networks Proceedings, 1998. IEEE World Congress on Computational Intelligence. The 1998 IEEE International Joint Conference on

Conference_Location :

Anchorage, AK

ISSN :

1098-7576

Print_ISBN :

0-7803-4859-1

Type :

conf

DOI :

10.1109/IJCNN.1998.687169

Filename :

687169

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2858373