Following Newton direction in Policy Gradient with parameter exploration

Author

Giorgio Manganini;Matteo Pirotta;Marcello Restelli;Luca Bascetta

Author_Institution

Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133, Italy

fYear

2015

fDate

7/1/2015 12:00:00 AM

Firstpage

Lastpage

Abstract

This paper investigates the use of second-order methods to solve Markov Decision Processes (MDPs). Despite the popularity of second-order methods in optimization literature, so far little attention has been paid to the extension of such techniques to face sequential decision problems. Here we provide a model-free Reinforcement Learning method that estimates the Newton direction by sampling directly in the parameter space. In order to compute the Newton direction we provide the formulation of the Hessian of the expected return, a technique for variance reduction in the sample-based estimation and a finite sample analysis in the case of Normal distribution. Beside discussing the theoretical properties, we empirically evaluate the method on an instructional linear-quadratic regulator and on a complex dynamical quadrotor system.

Keywords

Complexity theory

Publisher

ieee

Conference_Titel

Neural Networks (IJCNN), 2015 International Joint Conference on

Electronic_ISBN

2161-4407

Type

conf

DOI

10.1109/IJCNN.2015.7280673

Filename

7280673

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=3661360