DocumentCode :
3716838
Title :
Differential dynamic programming with temporally decomposed dynamics
Author :
Akihiko Yamaguchi;Christopher G. Atkeson
Author_Institution :
Robotics Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh PA 15213, United States
fYear :
2015
Firstpage :
696
Lastpage :
703
Abstract :
We explore a temporal decomposition of dynamics in order to enhance policy learning with unknown dynamics. There are model-free methods and model-based methods for policy learning with unknown dynamics, but both approaches have problems: in general, model-free methods have less generalization ability, while model-based methods are often limited by the assumed model structure or need to gather many samples to make models. We consider a temporal decomposition of dynamics to make learning models easier. To obtain a policy, we apply differential dynamic programming (DDP). A feature of our method is that we consider decomposed dynamics even when there is no action to be taken, which allows us to decompose dynamics more flexibly. Consequently learned dynamics become more accurate. Our DDP is a first-order gradient descent algorithm with a stochastic evaluation function. In DDP with learned models, typically there are many local maxima. In order to avoid them, we consider multiple criteria evaluation functions. In addition to the stochastic evaluation function, we use a reference value function. This method was verified with pouring simulation experiments where we created complicated dynamics. The results show that we can optimize actions with DDP while learning dynamics models.
Keywords :
"Dynamic programming","Stochastic processes","Heuristic algorithms","Containers","Computational modeling","Robots","Optimization"
Publisher :
ieee
Conference_Titel :
Humanoid Robots (Humanoids), 2015 IEEE-RAS 15th International Conference on
Type :
conf
DOI :
10.1109/HUMANOIDS.2015.7363430
Filename :
7363430
Link To Document :
بازگشت