DocumentCode
2105996
Title
A modified actor-critic reinforcement learning algorithm
Author
Mustapha, Sidi M. ; Lachiver, Gerard
Author_Institution
Dept. of Electr. Eng. & Comput. Eng., Sherbrooke Univ., Que., Canada
Volume
2
fYear
2000
fDate
2000
Firstpage
605
Abstract
This paper proposes a fast and efficient actor-critic reinforcement learning algorithm that is novel in at least two ways: it updates the critic only when the best action is executed and it takes full advantage of the powerful temporal difference (TD) prediction method to train a continuous-valued actor. Both actor and critic are represented separately by two adaptive neural fuzzy systems tuned by a backpropagation algorithm. While the critic adapts to the actor by minimizing the quadratic sum of TD error, the actor adapts to the critic, by not only using the TD error, but also by using the state value function. The new actor-critic architecture is applied to an inverted pendulum system, which is widely used to compare reinforcement learning architectures
Keywords
adaptive systems; backpropagation; computational complexity; fuzzy neural nets; minimisation; temporal reasoning; TD prediction method; adaptive neural fuzzy systems; backpropagation algorithm; continuous-valued actor training; inverted pendulum system; modified actor-critic reinforcement learning algorithm; quadratic sum minimization; reinforcement learning architectures; state value function; temporal difference prediction method; Adaptive systems; Backpropagation algorithms; Delay; Fuzzy systems; Learning systems; Power engineering and energy; Power engineering computing; Prediction methods; State estimation; Stochastic processes;
fLanguage
English
Publisher
ieee
Conference_Titel
Electrical and Computer Engineering, 2000 Canadian Conference on
Conference_Location
Halifax, NS
ISSN
0840-7789
Print_ISBN
0-7803-5957-7
Type
conf
DOI
10.1109/CCECE.2000.849537
Filename
849537
Link To Document