DocumentCode :
115709
Title :
Infinite time horizon maximum causal entropy inverse reinforcement learning
Author :
Bloem, Michael ; Bambos, Nicholas
Author_Institution :
Aviation Syst. Div., NASA Ames Res. Center, Moffett Field, CA, USA
fYear :
2014
fDate :
15-17 Dec. 2014
Firstpage :
4911
Lastpage :
4916
Abstract :
We extend the maximum causal entropy framework for inverse reinforcement learning to the infinite time horizon discounted reward setting. To do so, we maximize discounted future contributions to causal entropy subject to a discounted feature expectation matching constraint. A parameterized class of stochastic policies that solve this problem are referred to as soft Bellman policies because they can be specified in terms of values that satisfy an equation identical to the Bellman equation but with a softmax (the log of a sum of exponentials) instead of a max. Under some assumptions, algorithms that repeatedly solve for a soft Bellman policy, evaluate the policy, and then perform a gradient update on the parameters will find the optimal soft Bellman policy. For the first step, we extend techniques from dynamic programming and reinforcement learning so that they derive soft Bellman policies. For the second step, we can use policy evaluation techniques from dynamic programming or perform Monte Carlo simulations. We compare three algorithms of this type by applying them to a problem instance involving demonstration data from a simple controlled queuing network model inspired by problems in air traffic management.
Keywords :
Monte Carlo methods; air traffic; dynamic programming; entropy; learning (artificial intelligence); stochastic processes; Bellman equation; Monte Carlo simulations; air traffic management; controlled queuing network model; discounted feature expectation matching constraint; dynamic programming; infinite time horizon discounted reward setting; inverse reinforcement learning; maximum causal entropy; parameterized stochastic policies; policy evaluation techniques; soft Bellman policies; Context; Dynamic programming; Entropy; Finite element analysis; Heuristic algorithms; Stochastic processes; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Decision and Control (CDC), 2014 IEEE 53rd Annual Conference on
Conference_Location :
Los Angeles, CA
Print_ISBN :
978-1-4799-7746-8
Type :
conf
DOI :
10.1109/CDC.2014.7040156
Filename :
7040156
Link To Document :
بازگشت