• DocumentCode
    808568
  • Title

    Reinforcement Learning in Continuous Time and Space: Interference and Not Ill Conditioning Is the Main Problem When Using Distributed Function Approximators

  • Author

    Baddeley, Bart

  • Author_Institution
    Dept. of Inf., Univ. of Sussex, Brighton
  • Volume
    38
  • Issue
    4
  • fYear
    2008
  • Firstpage
    950
  • Lastpage
    956
  • Abstract
    Many interesting problems in reinforcement learning (RL) are continuous and/or high dimensional, and in this instance, RL techniques require the use of function approximators for learning value functions and policies. Often, local linear models have been preferred over distributed nonlinear models for function approximation in RL. We suggest that one reason for the difficulties encountered when using distributed architectures in RL is the problem of negative interference, whereby learning of new data disrupts previously learned mappings. The continuous temporal difference (TD) learning algorithm TD(lambda) was used to learn a value function in a limited-torque pendulum swing-up task using a multilayer perceptron (MLP) network. Three different approaches were examined for learning in the MLP networks; 1) simple gradient descent; 2) vario-eta; and 3) a pseudopattern rehearsal strategy that attempts to reduce the effects of interference. Our results show that MLP networks can be used for value function approximation in this task but require long training times. We also found that vario-eta destabilized learning and resulted in a failure of the learning process to converge. Finally, we showed that the pseudopattern rehearsal strategy drastically improved the speed of learning. The results indicate that interference is a greater problem than ill conditioning for this task.
  • Keywords
    continuous time systems; function approximation; gradient methods; learning (artificial intelligence); multilayer perceptrons; continuous temporal difference learning algorithm; distributed function approximator; gradient descent method; multilayer perceptron; pendulum swing-up task; pseudopattern rehearsal strategy; reinforcement learning; value function approximation; vario-eta; Biological neural networks; Biotechnology; Delay effects; Feedforward neural networks; Function approximation; Interference; Learning; Motor drives; Multilayer perceptrons; Neural networks; Continuous time systems; Ill-conditioning; distributed memory systems; feedforward neural networks; interference; Algorithms; Computer Simulation; Feedback; Models, Theoretical; Neural Networks (Computer); Programming, Linear; Reinforcement (Psychology); Systems Theory;
  • fLanguage
    English
  • Journal_Title
    Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1083-4419
  • Type

    jour

  • DOI
    10.1109/TSMCB.2008.921000
  • Filename
    4567536