Title :
Sample-based informationl-theoretic stochastic optimal control
Author :
Lioutikov, Rudolf ; Paraschos, Alexandros ; Peters, Jochen ; Neumann, Gerhard
Author_Institution :
Tech. Univ. Darmstadt, Darmstadt, Germany
fDate :
May 31 2014-June 7 2014
Abstract :
Many Stochastic Optimal Control (SOC) approaches rely on samples to either obtain an estimate of the value function or a linearisation of the underlying system model. However, these approaches typically neglect the fact that the accuracy of the policy update depends on the closeness of the resulting trajectory distribution to these samples. The greedy operator does not consider such closeness constraint to the samples. Hence, the greedy operator can lead to oscillations or even instabilities in the policy updates. Such undesired behaviour is likely to result in an inferior performance of the estimated policy. We reuse inspiration from the reinforcement learning community and relax the greedy operator used in SOC with an information theoretic bound that limits the `distance´ of two subsequent trajectory distributions in a policy update. The introduced bound ensures a smooth and stable policy update. Our method is also well suited for model-based reinforcement learning, where we estimate the system dynamics model from data. As this model is likely to be inaccurate, it might be dangerous to exploit the model greedily. Instead, our bound ensures that we generate new data in the vicinity of the current data, such that we can improve our estimate of the system dynamics model. We show that our approach outperforms several state of the art approaches on challenging simulated robot control tasks.
Keywords :
control engineering computing; learning (artificial intelligence); manipulators; optimal control; statistical distributions; stochastic processes; RL; SOC; greedy operator; reinforcement learning; robot arm; robot control tasks; stochastic optimal control; system dynamics model estimation; trajectory distribution; Approximation methods; Computational modeling; Data models; Optimization; Robots; System-on-chip; Trajectory;
Conference_Titel :
Robotics and Automation (ICRA), 2014 IEEE International Conference on
Conference_Location :
Hong Kong
DOI :
10.1109/ICRA.2014.6907424