مرکز منطقه ای اطلاع رساني علوم و فناوري - Inverse reinforcement learning using Dynamic Policy Programming

DocumentCode :

186269

Title :

Inverse reinforcement learning using Dynamic Policy Programming

Author :

Uchibe, Eiji ; Doya, Kenji

Author_Institution :

Neural Comput. Unit, Okinawa Inst. of Sci. & Technol. Grad. Univ., Okinawa, Japan

fYear :

2014

fDate :

13-16 Oct. 2014

Firstpage :

222

Lastpage :

228

Abstract :

This paper proposes a novel model-free inverse reinforcement learning method based on density ratio estimation under the framework of Dynamic Policy Programming. We show that the logarithm of the ratio between the optimal policy and the baseline policy is represented by the state-dependent cost and the value function. Our proposal is to use density ratio estimation methods to estimate the density ratio of policies and the least squares method with regularization to estimate the state-dependent cost and the value function that satisfies the relation. Our method can avoid computing the integral such as evaluating the partition function. A simple numerical simulation of a grid world navigation, a car driving, and a pendulum swing-up shows its superiority over conventional methods.

Keywords :

dynamic programming; estimation theory; learning (artificial intelligence); least squares approximations; car driving; density ratio estimation method; dynamic policy programming; grid world navigation; inverse reinforcement learning; least squares method; pendulum swing-up; state-dependent cost; value function; Cost function; Estimation; Learning (artificial intelligence); Mathematical model; Navigation; Trajectory; Vectors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Development and Learning and Epigenetic Robotics (ICDL-Epirob), 2014 Joint IEEE International Conferences on

Conference_Location :

Genoa

Type :

conf

DOI :

10.1109/DEVLRN.2014.6982985

Filename :

6982985

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=186269