A modified actor-critic reinforcement learning algorithm

Author

Mustapha, Sidi M. ; Lachiver, Gerard

Author_Institution

Dept. of Electr. Eng. & Comput. Eng., Sherbrooke Univ., Que., Canada

Volume

2

fYear

2000

fDate

2000

Firstpage

605

Abstract

This paper proposes a fast and efficient actor-critic reinforcement learning algorithm that is novel in at least two ways: it updates the critic only when the best action is executed and it takes full advantage of the powerful temporal difference (TD) prediction method to train a continuous-valued actor. Both actor and critic are represented separately by two adaptive neural fuzzy systems tuned by a backpropagation algorithm. While the critic adapts to the actor by minimizing the quadratic sum of TD error, the actor adapts to the critic, by not only using the TD error, but also by using the state value function. The new actor-critic architecture is applied to an inverted pendulum system, which is widely used to compare reinforcement learning architectures

Keywords

adaptive systems; backpropagation; computational complexity; fuzzy neural nets; minimisation; temporal reasoning; TD prediction method; adaptive neural fuzzy systems; backpropagation algorithm; continuous-valued actor training; inverted pendulum system; modified actor-critic reinforcement learning algorithm; quadratic sum minimization; reinforcement learning architectures; state value function; temporal difference prediction method; Adaptive systems; Backpropagation algorithms; Delay; Fuzzy systems; Learning systems; Power engineering and energy; Power engineering computing; Prediction methods; State estimation; Stochastic processes;

fLanguage

English

Publisher

ieee

Conference_Titel

Electrical and Computer Engineering, 2000 Canadian Conference on

Conference_Location

Halifax, NS

ISSN

0840-7789

Print_ISBN

0-7803-5957-7

Type

conf

DOI

10.1109/CCECE.2000.849537

Filename

849537