• DocumentCode
    2498528
  • Title

    Active exploration for robot parameter selection in episodic reinforcement learning

  • Author

    Kroemer, Oliver ; Peters, Jan

  • Author_Institution
    Max Planck Inst., Tubingen, Germany
  • fYear
    2011
  • fDate
    11-15 April 2011
  • Firstpage
    25
  • Lastpage
    31
  • Abstract
    As the complexity of robots and other autonomous systems increases, it becomes more important that these systems can adapt and optimize their settings actively. However, such optimization is rarely trivial. Sampling from the system is often expensive in terms of time and other costs, and excessive sampling should therefore be avoided. The parameter space is also usually continuous and multi-dimensional. Given the inherent exploration-exploitation dilemma of the problem, we propose treating it as an episodic reinforcement learning problem. In this reinforcement learning framework, the policy is defined by the system´s parameters and the rewards are given by the system´s performance. The rewards accumulate during each episode of a task. In this paper, we present a method for efficiently sampling and optimizing in continuous multidimensional spaces. The approach is based on Gaussian process regression, which can represent continuous non-linear mappings from parameters to system performance. We employ an upper confidence bound policy, which explicitly manages the trade-off between exploration and exploitation. Unlike many other policies for this kind of problem, we do not rely on a discretization of the action space. The presented method was evaluated on a real robot. The robot had to learn grasping parameters in order to adapt its grasping execution to different objects. The proposed method was also tested on a more general gain tuning problem. The results of the experiments show that the presented method can quickly determine suitable parameters and is applicable to real online learning applications.
  • Keywords
    Gaussian processes; computational complexity; learning (artificial intelligence); manipulators; mobile robots; regression analysis; Gaussian process regression; autonomous systems; continuous nonlinear mappings; episodic reinforcement learning problem; exploration-exploitation dilemma; general gain tuning problem; robot complexity; robot parameter selection; Convergence; Grasping; Ground penetrating radar; Kernel; Robots; Tuning; Upper bound;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2011 IEEE Symposium on
  • Conference_Location
    Paris
  • Print_ISBN
    978-1-4244-9887-1
  • Type

    conf

  • DOI
    10.1109/ADPRL.2011.5967378
  • Filename
    5967378