مرکز منطقه ای اطلاع رساني علوم و فناوري - A continuous estimation of distribution algorithm by evolving graph structures using reinforcement learning

DocumentCode :

2821012

Title :

A continuous estimation of distribution algorithm by evolving graph structures using reinforcement learning

Author :

Li, Xianneng ; Li, Bing ; Mabu, Shingo ; Hirasawa, Kotaro

Author_Institution :

Grad. Sch. of Inf., Waseda Univ., Kitakyushu, Japan

fYear :

2012

fDate :

10-15 June 2012

Firstpage :

Lastpage :

Abstract :

A novel graph-based Estimation of Distribution Algorithm (EDA) named Probabilistic Model Building Genetic Network Programming (PMBGNP) has been proposed. Inspired by classical EDAs, PMBGNP memorizes the current best individuals and uses them to estimate a distribution for the generation of the new population. However, PMBGNP can evolve compact programs by representing its solutions as graph structures. Therefore, it can solve a range of problems different from conventional ones in EDA literature, such as data mining and Reinforcement Learning (RL) problems. This paper extends PMBGNP from discrete to continuous search space, which is named PMBGNP-AC. Besides evolving the node connections to determine the optimal graph structures using conventional PMBGNP, Gaussian distribution is used for the distribution of continuous variables of nodes. The mean value μ and standard deviation σ are constructed like those of classical continuous Population-based incremental learning (PBILc). However, a RL technique, i.e., Actor-Critic (AC), is designed to update the parameters (μ and σ). AC allows us to calculate the Temporal-Difference (TD) error to evaluate whether the selection of the continuous value is better or worse than expected. This scalar reinforcement signal can decide whether the tendency to select this continuous value should be strengthened or weakened, allowing us to determine the shape of the probability density functions of the Gaussian distribution. The proposed algorithm is applied to a RL problem, i.e., autonomous robot control, where the robot´s wheel speeds and sensor values are continuous. The experimental results show the superiority of PMBGNP-AC comparing with the conventional algorithms.

Keywords :

Gaussian distribution; genetic algorithms; graph theory; learning (artificial intelligence); mobile robots; search problems; Gaussian distribution; PBILc; PMBGNP-AC; RL technique; TD error; actor-critic technique; autonomous robot control; continuous estimation; continuous population-based incremental learning; continuous search space; continuous value selection; continuous variables distribution; evolving graph structures; graph-based EDA; graph-based estimation of distribution algorithm; mean value; node connections; optimal graph structure determination; probabilistic model building genetic network programming; probability density functions; reinforcement learning; scalar reinforcement signal; sensor values; standard deviation; temporal difference error; wheel speeds;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Evolutionary Computation (CEC), 2012 IEEE Congress on

Conference_Location :

Brisbane, QLD

Print_ISBN :

978-1-4673-1510-4

Electronic_ISBN :

978-1-4673-1508-1

Type :

conf

DOI :

10.1109/CEC.2012.6256481

Filename :

6256481

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2821012