Continuous-time on-policy neural Reinforcement Learning of working memory tasks

Author

Davide Zambrano;Pieter R. Roelfsema;Sander M. Bohte

Author_Institution

CWI, Amsterdam, The Netherlands

fYear

2015

fDate

7/1/2015 12:00:00 AM

Firstpage

1

Lastpage

8

Abstract

As living organisms, one of our primary characteristics is the ability to rapidly process and react to unknown and unexpected events. To this end, we are able to recognize an event or a sequence of events and learn to respond properly. Despite advances in machine learning, current cognitive robotic systems are not able to rapidly and efficiently respond in the real world: the challenge is to learn to recognize both what is important, and also when to act. Reinforcement Learning (RL) is typically used to solve complex tasks: to learn the how. To respond quickly - to learn when - the environment has to be sampled often enough. For “enough”, a programmer has to decide on the step-size as a time-representation, choosing between a fine-grained representation of time (many state-transitions; difficult to learn with RL) or to a coarse temporal resolution (easier to learn with RL but lacking precise timing). Here, we derive a continuous-time version of on-policy SARSA-learning in a working-memory neural network model, AuGMEnT. Using a neural working memory network resolves the what problem, our when solution is built on the notion that in the real world, instantaneous actions of duration dt are actually impossible. We demonstrate how we can decouple action duration from the internal time-steps in the neural RL model using an action selection system. The resultant CT-AuGMEnT successfully learns to react to the events of a continuous-time task, without any pre-imposed specifications about the duration of the events or the delays between them.

Keywords

"Biological system modeling","Brain modeling","Feedforward neural networks"

Publisher

ieee

Conference_Titel

Neural Networks (IJCNN), 2015 International Joint Conference on

Electronic_ISBN

2161-4407

Type

conf

DOI

10.1109/IJCNN.2015.7280636

Filename

7280636