• DocumentCode
    586572
  • Title

    Scaling life-long off-policy learning

  • Author

    White, A. ; Modayil, J. ; Sutton, Richard S.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Alberta, Edmonton, AB, Canada
  • fYear
    2012
  • fDate
    7-9 Nov. 2012
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    In this paper we pursue an approach to scaling life-long learning using parallel off-policy reinforcement learning algorithms. In life-long learning a robot continually learns from a life-time of experience, slowly acquiring and applying skills and knowledge to new situations. Many of the benefits of life-long learning are a results of scaling the amount of training data, processed by the robot, to long sensorimotor streams. Another dimension of scaling can be added by allowing off-policy sampling from the unending stream of sensorimotor data generated by a long-lived robot. Recent algorithmic developments have made it possible to apply off-policy algorithms to life-long learning, in a sound way, for the first time. We assess the scalability of these off-policy algorithms on a physical robot. We show that hundreds of accurate multi-step predictions can be learned about several policies in parallel and in realtime. We present the first online measures of off-policy learning progress. Finally we demonstrate that our robot, using the new off-policy measures, can learn 8000 predictions about 300 distinct policies, a substantial increase in scale compared to previous simulated and robotic life-long learning systems.
  • Keywords
    control engineering computing; intelligent robots; learning (artificial intelligence); parallel processing; sampling methods; algorithm scalability; life-long learning scaling; life-time experience; multistep prediction; off-policy sampling; parallel off-policy reinforcement learning algorithm; physical robot; scaling dimension; sensorimotor data streams; skill acquisition; Approximation algorithms; Computer architecture; Function approximation; Prediction algorithms; Robot sensing systems; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Development and Learning and Epigenetic Robotics (ICDL), 2012 IEEE International Conference on
  • Conference_Location
    San Diego, CA
  • Print_ISBN
    978-1-4673-4964-2
  • Electronic_ISBN
    978-1-4673-4963-5
  • Type

    conf

  • DOI
    10.1109/DevLrn.2012.6400860
  • Filename
    6400860