• DocumentCode
    70492
  • Title

    Clipping in Neurocontrol by Adaptive Dynamic Programming

  • Author

    Fairbank, Michael ; Prokhorov, Danil ; Alonso, E.

  • Author_Institution
    Dept. of Comput. Sci., City Univ. London, London, UK
  • Volume
    25
  • Issue
    10
  • fYear
    2014
  • fDate
    Oct. 2014
  • Firstpage
    1909
  • Lastpage
    1920
  • Abstract
    In adaptive dynamic programming, neurocontrol, and reinforcement learning, the objective is for an agent to learn to choose actions so as to minimize a total cost function. In this paper, we show that when discretized time is used to model the motion of the agent, it can be very important to do clipping on the motion of the agent in the final time step of the trajectory. By clipping, we mean that the final time step of the trajectory is to be truncated such that the agent stops exactly at the first terminal state reached, and no distance further. We demonstrate that when clipping is omitted, learning performance can fail to reach the optimum, and when clipping is done properly, learning performance can improve significantly. The clipping problem we describe affects algorithms that use explicit derivatives of the model functions of the environment to calculate a learning gradient. These include backpropagation through time for control and methods based on dual heuristic programming. However, the clipping problem does not significantly affect methods based on heuristic dynamic programming, temporal differences learning, or policy-gradient learning algorithms.
  • Keywords
    dynamic programming; heuristic programming; learning (artificial intelligence); neurocontrollers; adaptive dynamic programming; agent motion modeling; clipping problem; discretized time; dual heuristic programming; heuristic dynamic programming; learning gradient; learning performance; neurocontrol; policy-gradient learning algorithms; reinforcement learning; temporal differences learning; Backpropagation; Cost function; Dynamic programming; Heuristic algorithms; Mathematical model; Trajectory; Vectors; Backpropagation through time (BPTT); clipping; dual heuristic programming (DHP); neurocontrol; value-gradient learning; value-gradient learning.;
  • fLanguage
    English
  • Journal_Title
    Neural Networks and Learning Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    2162-237X
  • Type

    jour

  • DOI
    10.1109/TNNLS.2014.2297991
  • Filename
    6718072