• DocumentCode
    2498039
  • Title

    Safe reinforcement learning in high-risk tasks through policy improvement

  • Author

    Polo, Francisco Javier Garcia ; Rebollo, Fernando Fernandez

  • Author_Institution
    Comput. Sci. Dept., Univ. Carlos III de Madrid, Leganés, Spain
  • fYear
    2011
  • fDate
    11-15 April 2011
  • Firstpage
    76
  • Lastpage
    83
  • Abstract
    Reinforcement Learning (RL) methods are widely used for dynamic control tasks. In many cases, these are high risk tasks where the trial and error process may select actions which execution from unsafe states can be catastrophic. In addition, many of these tasks have continuous state and action spaces, making the learning problem harder and unapproachable with conventional RL algorithms. So, when the agent begins to interact with a risky and large state-action space environment, an important question arises: how can we avoid that the exploration of the state-action space causes damages in the learning (or other) systems. In this paper, we define the concept of risk and address the problem of safe exploration in the context of RL. Our notion of safety is concerned with states that can lead to damage. Moreover, we introduce an algorithm that safely improves suboptimal but robust behaviors for continuous state and action control tasks, and that learns efficiently from the experience gathered from the environment. We report experimental results using the helicopter hovering task from the RL Competition.
  • Keywords
    aircraft control; control engineering computing; helicopters; learning (artificial intelligence); multi-agent systems; action control tasks; agent; dynamic control tasks; helicopter hovering task; high-risk tasks; policy improvement; reinforcement learning; trial and error process; Computer crashes; Helicopters; Mathematical model; Robots; Robustness; Safety; Trajectory;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2011 IEEE Symposium on
  • Conference_Location
    Paris
  • Print_ISBN
    978-1-4244-9887-1
  • Type

    conf

  • DOI
    10.1109/ADPRL.2011.5967356
  • Filename
    5967356