Title :
Replay, Working Memory and Action Selection in Temporal Credit Assignment - a Simple Neural Network Model
Author_Institution :
RIKEN Brain Sci. Inst., Saitama
Abstract :
Reverse replay of recent behavioural sequences immediately after experience at the reward location has recently been reported. Here we suggest a simple model of how such replay can be used to select an action and provide a reinforcement signal in the temporal credit assignment problem. When reward is found, replay of recent states experienced prior to discovery of reward is generated as a type of working memory. This acts to reinforces the actions associated with the replayed states, which are necessarily the actions which led to the reward being found, while other competing actions are not reinforced by the replay. On the other hand, actions which are selected but are not reinforced are punished. Here we suggest a firing rate neural network model implementation of this system based on the Basal-Ganglia anatomy with input from a cortical association layer generating replay and auto-catalytic feedback from the dopamine system as a reward signal modulating three-way Hebbian long term potentiation and depression (LTP/LTD) at the cortical-striatal synapses. The model is illustrated by numerical simulations of a simple example -that of associating a cue signal to a correct action to obtain reward after a delay period, typical of primate cue reward tasks. Through the learning the model shows a transition from an exploratory phase where actions are generated randomly, to a goal directed phase where the animal always chooses the correct action for each experienced state.
Keywords :
Hebbian learning; neural nets; Basal-Ganglia anatomy; Hebbian long term depression; Hebbian long term potentiation; action correction; action replay; action selection; autocatalytic feedback; behavioural sequence; cortical association layer; cortical-striatal synapsis; cue signal; dopamine system; firing rate neural network model; learning; numerical simulation; reinforcement signal; reward discovery; reward location; reward signal; temporal credit assignment; working memory; Anatomy; Animals; Basal ganglia; Delay; Neural networks; Neurofeedback; Neurons; Numerical simulation; Signal generators; Switches;
Conference_Titel :
Neural Networks, 2007. IJCNN 2007. International Joint Conference on
Conference_Location :
Orlando, FL
Print_ISBN :
978-1-4244-1379-9
Electronic_ISBN :
1098-7576
DOI :
10.1109/IJCNN.2007.4371440