مرکز منطقه ای اطلاع رساني علوم و فناوري - Infinite time horizon maximum causal entropy inverse reinforcement learning

DocumentCode :

115709

Title :

Infinite time horizon maximum causal entropy inverse reinforcement learning

Author :

Bloem, Michael ; Bambos, Nicholas

Author_Institution :

Aviation Syst. Div., NASA Ames Res. Center, Moffett Field, CA, USA

fYear :

2014

fDate :

15-17 Dec. 2014

Firstpage :

4911

Lastpage :

4916

Abstract :

We extend the maximum causal entropy framework for inverse reinforcement learning to the infinite time horizon discounted reward setting. To do so, we maximize discounted future contributions to causal entropy subject to a discounted feature expectation matching constraint. A parameterized class of stochastic policies that solve this problem are referred to as soft Bellman policies because they can be specified in terms of values that satisfy an equation identical to the Bellman equation but with a softmax (the log of a sum of exponentials) instead of a max. Under some assumptions, algorithms that repeatedly solve for a soft Bellman policy, evaluate the policy, and then perform a gradient update on the parameters will find the optimal soft Bellman policy. For the first step, we extend techniques from dynamic programming and reinforcement learning so that they derive soft Bellman policies. For the second step, we can use policy evaluation techniques from dynamic programming or perform Monte Carlo simulations. We compare three algorithms of this type by applying them to a problem instance involving demonstration data from a simple controlled queuing network model inspired by problems in air traffic management.

Keywords :

Monte Carlo methods; air traffic; dynamic programming; entropy; learning (artificial intelligence); stochastic processes; Bellman equation; Monte Carlo simulations; air traffic management; controlled queuing network model; discounted feature expectation matching constraint; dynamic programming; infinite time horizon discounted reward setting; inverse reinforcement learning; maximum causal entropy; parameterized stochastic policies; policy evaluation techniques; soft Bellman policies; Context; Dynamic programming; Entropy; Finite element analysis; Heuristic algorithms; Stochastic processes; Vectors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Decision and Control (CDC), 2014 IEEE 53rd Annual Conference on

Conference_Location :

Los Angeles, CA

Print_ISBN :

978-1-4799-7746-8

Type :

conf

DOI :

10.1109/CDC.2014.7040156

Filename :

7040156

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=115709