Title :
Two stochastic dynamic programming problems by model-free actor-critic recurrent-network learning in non-Markovian settings
Author :
Mizutani, Eiji ; Dreyfus, Stuart E.
Author_Institution :
Dept. of Comput. Sci., Tsing Hua Univ., Hsinchu, Taiwan
Abstract :
We describe two stochastic non-Markovian dynamic programming (DP) problems, showing how the posed problems can be attacked by using actor-critic reinforcement learning with recurrent neural networks (RNN). We assume that the current state of a dynamical system is "completely observable", but that the rules, unknown to our decision-making agent, for the current reward and state transition depend not only on current state and action, but on possibly the "entire history" of past states and actions. This should not be confused with problems of "partially observable Markov decision processes (POMDPs)", where the current state is only deduced from either partial (observable) state alone or error-corrupted observations. Our actor-critic RNN agent is capable of finding an optimal policy, while learning neither transitional probabilities, associated rewards, nor by how much the current state space must be augmented so that the Markov property holds. The RNN\´s recurrent connections or context units function as an "implicit" history memory (or internal state) to develop "sensitivity" to non-Markovian dependencies, rendering the process Markovian implicitly and automatically in a "totally model-free" fashion. In particular, using two small-scale longest-path problems in a stochastic non-Markovian setting, we discuss model-free learning features in comparison with the model-based approach by the classical DP algorithm.
Keywords :
Markov processes; decision making; dynamic programming; learning (artificial intelligence); probability; recurrent neural nets; stochastic programming; actor-critic reinforcement learning; decision making agent; error corrupted observations; implicit history memory; model free learning; nonMarkovian dynamic programming; nonMarkovian settings; partially observable Markov decision processes; recurrent neural networks; small scale longest path problems; state space methods; state transition; stochastic dynamic programming problems; transitional probability; Computer industry; Computer science; Context modeling; Dynamic programming; Electronic mail; History; Learning; Projectiles; Recurrent neural networks; Stochastic processes;
Conference_Titel :
Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on
Print_ISBN :
0-7803-8359-1
DOI :
10.1109/IJCNN.2004.1380084