DocumentCode :
3709996
Title :
Learning compound multi-step controllers under unknown dynamics
Author :
Weiqiao Han;Sergey Levine;Pieter Abbeel
Author_Institution :
Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, CA, USA
fYear :
2015
fDate :
9/1/2015 12:00:00 AM
Firstpage :
6435
Lastpage :
6442
Abstract :
Applications of reinforcement learning for robotic manipulation often assume an episodic setting. However, controllers trained with reinforcement learning are often situated in the context of a more complex compound task, where multiple controllers might be invoked in sequence to accomplish a higher-level goal. Furthermore, training such controllers typically requires resetting the environment between episodes, which is typically handled manually. We describe an approach for training chains of controllers with reinforcement learning. This requires taking into account the state distributions induced by preceding controllers in the chain, as well as automatically training reset controllers that can reset the task between episodes. The initial state of each controller is determined by the controller that precedes it, resulting in a non-stationary learning problem. We demonstrate that a recently developed method that optimizes linear-Gaussian controllers under learned local linear models can tackle this sort of non-stationary problem, and that training controllers concurrently with a corresponding reset controller only minimally increases training time. We also demonstrate this method on a complex tool use task that consists of seven stages and requires using a toy wrench to screw in a bolt. This compound task requires grasping and handling complex contact dynamics. After training, the controllers can execute the entire task quickly and efficiently. Finally, we show that this method can be combined with guided policy search to automatically train nonlinear neural network controllers for a grasping task with considerable variation in target position.
Keywords :
"Heuristic algorithms","Training","Learning (artificial intelligence)","Robots","Compounds","Trajectory","Neural networks"
Publisher :
ieee
Conference_Titel :
Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on
Type :
conf
DOI :
10.1109/IROS.2015.7354297
Filename :
7354297
Link To Document :
بازگشت