مرکز منطقه ای اطلاع رساني علوم و فناوري - Function approximation for large markov decision processes using self-organizing neural networks

Abstract :

Reinforcement learning (RL) in problems with large markov decision processes (MDPs) can result in many action policies. Over time, it becomes non-trivial computationally to search for action policies that are effective to the states. Function approximation can be used to generalize the action policies to improve the search efficiency. Based on the Adaptive Resonance Theory, FASON is a type of self-organizing neural networks that can be used for function approximation. This paper introduces the features that allow FASON to act and learn efficiently in large MDPs. Being structurally compatible to the propositional rules, the performance of FASON can be improved by directly inserting not-easily-learned domain knowledge into it. For speeding up RL, the Knowledge-based Exploration strategy is used to direct exploration. To select cognitive nodes effective to the states, a Reward Vigilance Adaptation strategy is used to adapt the reward vigilance. The performance of FASON is compared with selected benchmark models in three RL problems with large MDPs. Ten test cases were designed to study the effect of four key parameters on FASON. Experiment results of the winning parameters are presented to inform on the best ways of using FASON.