• DocumentCode
    8519
  • Title

    Continuous action reinforcement learning for control-affine systems with unknown dynamics

  • Author

    Faust, Aleksandra ; Ruymgaart, Peter ; Salman, Molly ; Fierro, Rafael ; Tapia, Lydia

  • Author_Institution
    Dept. of Comput. Sci., Univ. of New Mexico, Albuquerque, NM, USA
  • Volume
    1
  • Issue
    3
  • fYear
    2014
  • fDate
    Jul-14
  • Firstpage
    323
  • Lastpage
    336
  • Abstract
    Control of nonlinear systems is challenging in realtime. Decision making, performed many times per second, must ensure system safety. Designing input to perform a task often involves solving a nonlinear system of differential equations, which is a computationally intensive, if not intractable problem. This article proposes sampling-based task learning for control-affine nonlinear systems through the combined learning of both state and action-value functions in a model-free approximate value iteration setting with continuous inputs. A quadratic negative definite state-value function implies the existence of a unique maximum of the action-value function at any state. This allows the replacement of the standard greedy policy with a computationally efficient policy approximation that guarantees progression to a goal state without knowledge of the system dynamics. The policy approximation is consistent, i.e., it does not depend on the action samples used to calculate it. This method is appropriate for mechanical systems with high-dimensional input spaces and unknown dynamics performing Constraint-Balancing Tasks. We verify it both in simulation and experimentally for an Unmanned Aerial Vehicles (UAVs) carrying a suspended load, and in simulation, for the rendezvous of heterogeneous robots.
  • Keywords
    aerospace computing; approximation theory; autonomous aerial vehicles; decision making; differential equations; learning (artificial intelligence); nonlinear control systems; simulation; UAV; action-value functions; computationally efficient policy approximation; constraint-balancing task; continuous action reinforcement learning; control-affine nonlinear systems; decision making; differential equations; heterogeneous robot rendezvous; high-dimensional input spaces; mechanical systems; model-free approximate value iteration setting; quadratic negative definite state-value function; sampling-based task learning; simulation; standard greedy policy; state-value functions; suspended load; unknown dynamics; unmanned aerial vehicles; Control systems; Function approximation; Learning (artificial intelligence); Nonlinear dynamical systems; Robots; Vehicle dynamics; Reinforcement learning; approximate value iteration; continuous action spaces; control-affine nonlinear systems; fitted value iteration; policy approximation;
  • fLanguage
    English
  • Journal_Title
    Automatica Sinica, IEEE/CAA Journal of
  • Publisher
    ieee
  • ISSN
    2329-9266
  • Type

    jour

  • DOI
    10.1109/JAS.2014.7004690
  • Filename
    7004690