Simultaneous motor and sensory learning for imitation

Author

Tingfan Wu ; Movellan, J.

Author_Institution

Machine Perception Lab., UC San Diego, San Diego, CA, USA

fYear

2012

fDate

7-9 Nov. 2012

Firstpage

1

Lastpage

2

Abstract

One important capability for future robots to go beyond factory assembly line into human daily life is to be able to acquire new motor skill autonomously from naïve human teachers. Recently, many new imitation learning algorithms were proposed with promising results, such as throwing a ball into a basket[1], jiggling ping-pong balls on a racket[2], and moving a tendon driven hand to tap a switch[3]. These robots typically start from blindly copying demonstrator´s trajectory and then iteratively improve the trajectory with respect to certain predefined “goodness” function. We call it “scoring function.” Most of these tasks are spatial movements and thus the scoring function can be heuristically defined based on the end-effector position or system state. However, in many real life tasks, the outcome of an action is more important rather than the movement trajectory itself. A proper measurable scoring function that evaluates the “goodness” of the outcome is thus necessary. In this work, we consider the case where the scoring function is a not a trivial spatial mapping from robot movement to a scalar. For example, the softness of the sounds of a violin played by a robot with certain contact force and velocity of the bow. Another example is the happiness of a robotic face given the amount of facial servo movement. In this case, the non-trivial cost function needs to be learned either from human demonstrations prior to motor learning or from robot´s own exploration during motor learning with feedback from a teacher. In particular, we develop the following learning framework as in Fig. 1. In the proposed framework, both the scoring function as well as the motor policy are learned at the same time. However, the two learning systems do not have to be fully synchronized. For example, the human teacher may decide to give the robot feedback only on some of the executions and let the robot explore by itself most of the - ime.

Keywords

end effectors; feedback; iterative methods; learning (artificial intelligence); learning systems; robotic assembly; ball throwing; end-effector position; factory assembly line; goodness function; human demonstrations; imitation learning algorithms; learning systems; motor policy; motor skill acquisition; nontrivial cost function; ping-pong ball jinggling; robot feedback; scoring function; simultaneous sensory learning; system state; tendon driven hand; trajectory copying; trajectory improvement; Force; Humans; Learning systems; Machine learning; Robot sensing systems; Trajectory;

fLanguage

English

Publisher

ieee

Conference_Titel

Development and Learning and Epigenetic Robotics (ICDL), 2012 IEEE International Conference on

Conference_Location

San Diego, CA

Print_ISBN

978-1-4673-4964-2

Electronic_ISBN

978-1-4673-4963-5

Type

conf

DOI

10.1109/DevLrn.2012.6400890

Filename

6400890