Title :
Exploring the relationship of reward and punishment in reinforcement learning
Author :
Lowe, Robert ; Ziemke, Tom
Author_Institution :
Interaction Lab., Univ. of Skovde, Skovde, Sweden
Abstract :
We present a reinforcement learning algorithm based on Dyna-Sarsa that utilizes separate representations of reward and punishment when guiding state-action value learning and action selection. The adoption of policy meta-learning optimized by a genetic algorithm is explored and results in the context of a two-armed bandit goal-navigation task in a simple grid world are presented. The findings argue for an important role for a genetic algorithm approach for constructing the foundations of autonomous reinforcement learning agents.
Keywords :
genetic algorithms; learning (artificial intelligence); Dyna-Sarsa algorithm; action meta-learning functions; action selection; autonomous reinforcement learning agents; genetic algorithm approach; grid world; meta-learning policy optimization; punishment; reinforcement learning algorithm; reward; state-action value learning; two-armed bandit goal-navigation task; Context; Cost accounting; Genetic algorithms; Learning (artificial intelligence); Navigation; Optimization; Planning; Genetic Algorithm; Punishment; Reinforcement Contingencies; Reward; SARSA; TD learning; Value;
Conference_Titel :
Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2013 IEEE Symposium on
Conference_Location :
Singapore
DOI :
10.1109/ADPRL.2013.6615000