• DocumentCode
    2590736
  • Title

    Adaptive exploration for continual reinforcement learning

  • Author

    Stulp, Freek

  • fYear
    2012
  • fDate
    7-12 Oct. 2012
  • Firstpage
    1631
  • Lastpage
    1636
  • Abstract
    Most experiments on policy search for robotics focus on isolated tasks, where the experiment is split into two distinct phases: (1) the learning phase, where the robot learns the task through exploration; (2) the exploitation phase, where exploration is turned off, and the robot demonstrates its performance on the task it has learned. In this paper, we present an algorithm that enables robots to continually and autonomously alternate between these phases. We do so by combining the `Policy Improvement with Path Integrals´ direct reinforcement learning algorithm with the covariance matrix adaptation rule from the `Cross-Entropy Method´ optimization algorithm. This integration is possible because both algorithms iteratively update parameters with probability-weighted averaging. A practical advantage of the novel algorithm, called PI2-CMA, is that it alleviates the user from having to manually tune the degree of exploration. We evaluate PI2-CMA´s ability to continually and autonomously tune exploration on two tasks.
  • Keywords
    covariance matrices; entropy; intelligent robots; learning (artificial intelligence); mobile robots; optimisation; probability; PI2-CMA algorithm; continual reinforcement learning adaptive exploration degree; covariance matrix adaptation rule; cross-entropy method optimization algorithm; exploitation phase; iterative parameter update; policy improvement-with-path integrals direct reinforcement learning algorithm; probability-weighted averaging; robot learning phase; Convergence; Cost function; Covariance matrix; Learning; Robots; Trajectory; Visualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on
  • Conference_Location
    Vilamoura
  • ISSN
    2153-0858
  • Print_ISBN
    978-1-4673-1737-5
  • Type

    conf

  • DOI
    10.1109/IROS.2012.6385818
  • Filename
    6385818