Infinite time horizon maximum causal entropy inverse reinforcement learning

Author

Bloem, Michael ; Bambos, Nicholas

Author_Institution

Aviation Syst. Div., NASA Ames Res. Center, Moffett Field, CA, USA

fYear

2014

fDate

15-17 Dec. 2014

Firstpage

4911

Lastpage

4916

Abstract

We extend the maximum causal entropy framework for inverse reinforcement learning to the infinite time horizon discounted reward setting. To do so, we maximize discounted future contributions to causal entropy subject to a discounted feature expectation matching constraint. A parameterized class of stochastic policies that solve this problem are referred to as soft Bellman policies because they can be specified in terms of values that satisfy an equation identical to the Bellman equation but with a softmax (the log of a sum of exponentials) instead of a max. Under some assumptions, algorithms that repeatedly solve for a soft Bellman policy, evaluate the policy, and then perform a gradient update on the parameters will find the optimal soft Bellman policy. For the first step, we extend techniques from dynamic programming and reinforcement learning so that they derive soft Bellman policies. For the second step, we can use policy evaluation techniques from dynamic programming or perform Monte Carlo simulations. We compare three algorithms of this type by applying them to a problem instance involving demonstration data from a simple controlled queuing network model inspired by problems in air traffic management.

Keywords

Monte Carlo methods; air traffic; dynamic programming; entropy; learning (artificial intelligence); stochastic processes; Bellman equation; Monte Carlo simulations; air traffic management; controlled queuing network model; discounted feature expectation matching constraint; dynamic programming; infinite time horizon discounted reward setting; inverse reinforcement learning; maximum causal entropy; parameterized stochastic policies; policy evaluation techniques; soft Bellman policies; Context; Dynamic programming; Entropy; Finite element analysis; Heuristic algorithms; Stochastic processes; Vectors;

fLanguage

English

Publisher

ieee

Conference_Titel

Decision and Control (CDC), 2014 IEEE 53rd Annual Conference on

Conference_Location

Los Angeles, CA

Print_ISBN

978-1-4799-7746-8

Type

conf

DOI

10.1109/CDC.2014.7040156

Filename

7040156