DocumentCode
115709
Title
Infinite time horizon maximum causal entropy inverse reinforcement learning
Author
Bloem, Michael ; Bambos, Nicholas
Author_Institution
Aviation Syst. Div., NASA Ames Res. Center, Moffett Field, CA, USA
fYear
2014
fDate
15-17 Dec. 2014
Firstpage
4911
Lastpage
4916
Abstract
We extend the maximum causal entropy framework for inverse reinforcement learning to the infinite time horizon discounted reward setting. To do so, we maximize discounted future contributions to causal entropy subject to a discounted feature expectation matching constraint. A parameterized class of stochastic policies that solve this problem are referred to as soft Bellman policies because they can be specified in terms of values that satisfy an equation identical to the Bellman equation but with a softmax (the log of a sum of exponentials) instead of a max. Under some assumptions, algorithms that repeatedly solve for a soft Bellman policy, evaluate the policy, and then perform a gradient update on the parameters will find the optimal soft Bellman policy. For the first step, we extend techniques from dynamic programming and reinforcement learning so that they derive soft Bellman policies. For the second step, we can use policy evaluation techniques from dynamic programming or perform Monte Carlo simulations. We compare three algorithms of this type by applying them to a problem instance involving demonstration data from a simple controlled queuing network model inspired by problems in air traffic management.
Keywords
Monte Carlo methods; air traffic; dynamic programming; entropy; learning (artificial intelligence); stochastic processes; Bellman equation; Monte Carlo simulations; air traffic management; controlled queuing network model; discounted feature expectation matching constraint; dynamic programming; infinite time horizon discounted reward setting; inverse reinforcement learning; maximum causal entropy; parameterized stochastic policies; policy evaluation techniques; soft Bellman policies; Context; Dynamic programming; Entropy; Finite element analysis; Heuristic algorithms; Stochastic processes; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Decision and Control (CDC), 2014 IEEE 53rd Annual Conference on
Conference_Location
Los Angeles, CA
Print_ISBN
978-1-4799-7746-8
Type
conf
DOI
10.1109/CDC.2014.7040156
Filename
7040156
Link To Document