• DocumentCode
    115709
  • Title

    Infinite time horizon maximum causal entropy inverse reinforcement learning

  • Author

    Bloem, Michael ; Bambos, Nicholas

  • Author_Institution
    Aviation Syst. Div., NASA Ames Res. Center, Moffett Field, CA, USA
  • fYear
    2014
  • fDate
    15-17 Dec. 2014
  • Firstpage
    4911
  • Lastpage
    4916
  • Abstract
    We extend the maximum causal entropy framework for inverse reinforcement learning to the infinite time horizon discounted reward setting. To do so, we maximize discounted future contributions to causal entropy subject to a discounted feature expectation matching constraint. A parameterized class of stochastic policies that solve this problem are referred to as soft Bellman policies because they can be specified in terms of values that satisfy an equation identical to the Bellman equation but with a softmax (the log of a sum of exponentials) instead of a max. Under some assumptions, algorithms that repeatedly solve for a soft Bellman policy, evaluate the policy, and then perform a gradient update on the parameters will find the optimal soft Bellman policy. For the first step, we extend techniques from dynamic programming and reinforcement learning so that they derive soft Bellman policies. For the second step, we can use policy evaluation techniques from dynamic programming or perform Monte Carlo simulations. We compare three algorithms of this type by applying them to a problem instance involving demonstration data from a simple controlled queuing network model inspired by problems in air traffic management.
  • Keywords
    Monte Carlo methods; air traffic; dynamic programming; entropy; learning (artificial intelligence); stochastic processes; Bellman equation; Monte Carlo simulations; air traffic management; controlled queuing network model; discounted feature expectation matching constraint; dynamic programming; infinite time horizon discounted reward setting; inverse reinforcement learning; maximum causal entropy; parameterized stochastic policies; policy evaluation techniques; soft Bellman policies; Context; Dynamic programming; Entropy; Finite element analysis; Heuristic algorithms; Stochastic processes; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Decision and Control (CDC), 2014 IEEE 53rd Annual Conference on
  • Conference_Location
    Los Angeles, CA
  • Print_ISBN
    978-1-4799-7746-8
  • Type

    conf

  • DOI
    10.1109/CDC.2014.7040156
  • Filename
    7040156