مرکز منطقه ای اطلاع رساني علوم و فناوري - Fuzzy C-means method for representation policy iteration in reinforcement learning

DocumentCode :

2165777

Title :

Fuzzy C-means method for representation policy iteration in reinforcement learning

Author :

Huang, Zhenhua ; Xu, Xin ; Wu, Jun ; Zuo, Lei

Author_Institution :

Coll. of Mechatron. & Autom., Nat. Univ. of Defense Technol., Changsha, China

fYear :

2012

fDate :

11-14 April 2012

Firstpage :

175

Lastpage :

180

Abstract :

This paper introduces a Fuzzy C-means method as the subsampling method for Representation Policy Iteration (RPI) in Reinforcement Learning. RPI is a new class of algorithm that automatically learns both basis functions and approximately optimal policy. In this paper the procedures of the RPI algorithm are as follows. Firstly samples are collected using a random or guided policy. The subset samples are obtained from the original ones subsequently by using the Fuzzy C-means (FCM) method as the subsampling method. Then global basis functions called proto-value functions (PVFs) are formed by using the eigenfunctions of the graph Laplacian operator on an undirected graph constructed from the subset samples. Finally, the least-square policy iteration (LSPI) as the parameter estimation method is used for learning an approximately optimal policy. Illustrative experiments on an Inverted Pendulum problem were accomplished to compare the performance of RPI using the FCM method as the subsampling method with that using the previous subsampling method.

Keywords :

fuzzy set theory; iterative methods; learning (artificial intelligence); least squares approximations; FCM; LSPI; PVF; RPI; eigenfunctions; fuzzy C-means method; graph Laplacian operator; inverted pendulum problem; least square policy iteration; optimal policy approximation; parameter estimation method; policy iteration representation; proto value functions; reinforcement learning; representation policy iteration; subsampling method; subset samples; undirected graph construction; Approximation algorithms; Clustering algorithms; Economic indicators; Eigenvalues and eigenfunctions; Function approximation; Laplace equations; Reinforcement learning; fuzzy c-means; graph laplacian operator; proto-value function; representation policy iteration; subsampling;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Networking, Sensing and Control (ICNSC), 2012 9th IEEE International Conference on

Conference_Location :

Beijing

Print_ISBN :

978-1-4673-0388-0

Type :

conf

DOI :

10.1109/ICNSC.2012.6204912

Filename :

6204912

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2165777