Reinforcement learning of full-body humanoid motor skills

Author

Stulp, Freek ; Buchli, Jonas ; Theodorou, Evangelos ; Schaal, Stefan

Author_Institution

Comput. Learning & Motor Control Lab., Univ. of Southern California, Los Angeles, CA, USA

fYear

2010

fDate

6-8 Dec. 2010

Firstpage

405

Lastpage

410

Abstract

Applying reinforcement learning to humanoid robots is challenging because humanoids have a large number of degrees of freedom and state and action spaces are continuous. Thus, most reinforcement learning algorithms would become computationally infeasible and require a prohibitive amount of trials to explore such high-dimensional spaces. In this paper, we present a probabilistic reinforcement learning approach, which is derived from the framework of stochastic optimal control and path integrals. The algorithm, called Policy Improvement with Path Integrals (PI²), has a surprisingly simple form, has no open tuning parameters besides the exploration noise, is model-free, and performs numerically robustly in high dimensional learning problems. We demonstrate how PI² is able to learn full-body motor skills on a 34-DOF humanoid robot. To demonstrate the generality of our approach, we also apply PI² in the context of variable impedance control, where both planned trajectories and gain schedules for each joint are optimized simultaneously.

Keywords

humanoid robots; learning (artificial intelligence); optimal control; path planning; position control; stochastic processes; 34-DOF humanoid robot; degrees of freedom; full body humanoid motor skill; impedance control; open tuning parameter; path integral; planned trajectory; policy improvement; probabilistic reinforcement learning; stochastic optimal control; Humanoid robots; Joints; Learning; Noise; Optimal control; Trajectory;

fLanguage

English

Publisher

ieee

Conference_Titel

Humanoid Robots (Humanoids), 2010 10th IEEE-RAS International Conference on

Conference_Location

Nashville, TN

Print_ISBN

978-1-4244-8688-5

Electronic_ISBN

978-1-4244-8689-2

Type

conf

DOI

10.1109/ICHR.2010.5686320

Filename

5686320