Scaling life-long off-policy learning

Author

White, A. ; Modayil, J. ; Sutton, Richard S.

Author_Institution

Dept. of Comput. Sci., Univ. of Alberta, Edmonton, AB, Canada

fYear

2012

fDate

7-9 Nov. 2012

Firstpage

1

Lastpage

6

Abstract

In this paper we pursue an approach to scaling life-long learning using parallel off-policy reinforcement learning algorithms. In life-long learning a robot continually learns from a life-time of experience, slowly acquiring and applying skills and knowledge to new situations. Many of the benefits of life-long learning are a results of scaling the amount of training data, processed by the robot, to long sensorimotor streams. Another dimension of scaling can be added by allowing off-policy sampling from the unending stream of sensorimotor data generated by a long-lived robot. Recent algorithmic developments have made it possible to apply off-policy algorithms to life-long learning, in a sound way, for the first time. We assess the scalability of these off-policy algorithms on a physical robot. We show that hundreds of accurate multi-step predictions can be learned about several policies in parallel and in realtime. We present the first online measures of off-policy learning progress. Finally we demonstrate that our robot, using the new off-policy measures, can learn 8000 predictions about 300 distinct policies, a substantial increase in scale compared to previous simulated and robotic life-long learning systems.

Keywords

control engineering computing; intelligent robots; learning (artificial intelligence); parallel processing; sampling methods; algorithm scalability; life-long learning scaling; life-time experience; multistep prediction; off-policy sampling; parallel off-policy reinforcement learning algorithm; physical robot; scaling dimension; sensorimotor data streams; skill acquisition; Approximation algorithms; Computer architecture; Function approximation; Prediction algorithms; Robot sensing systems; Vectors;

fLanguage

English

Publisher

ieee

Conference_Titel

Development and Learning and Epigenetic Robotics (ICDL), 2012 IEEE International Conference on

Conference_Location

San Diego, CA

Print_ISBN

978-1-4673-4964-2

Electronic_ISBN

978-1-4673-4963-5

Type

conf

DOI

10.1109/DevLrn.2012.6400860

Filename

6400860