Title :
A modified actor-critic reinforcement learning algorithm
Author :
Mustapha, Sidi M. ; Lachiver, Gerard
Author_Institution :
Dept. of Electr. Eng. & Comput. Eng., Sherbrooke Univ., Que., Canada
Abstract :
This paper proposes a fast and efficient actor-critic reinforcement learning algorithm that is novel in at least two ways: it updates the critic only when the best action is executed and it takes full advantage of the powerful temporal difference (TD) prediction method to train a continuous-valued actor. Both actor and critic are represented separately by two adaptive neural fuzzy systems tuned by a backpropagation algorithm. While the critic adapts to the actor by minimizing the quadratic sum of TD error, the actor adapts to the critic, by not only using the TD error, but also by using the state value function. The new actor-critic architecture is applied to an inverted pendulum system, which is widely used to compare reinforcement learning architectures
Keywords :
adaptive systems; backpropagation; computational complexity; fuzzy neural nets; minimisation; temporal reasoning; TD prediction method; adaptive neural fuzzy systems; backpropagation algorithm; continuous-valued actor training; inverted pendulum system; modified actor-critic reinforcement learning algorithm; quadratic sum minimization; reinforcement learning architectures; state value function; temporal difference prediction method; Adaptive systems; Backpropagation algorithms; Delay; Fuzzy systems; Learning systems; Power engineering and energy; Power engineering computing; Prediction methods; State estimation; Stochastic processes;
Conference_Titel :
Electrical and Computer Engineering, 2000 Canadian Conference on
Conference_Location :
Halifax, NS
Print_ISBN :
0-7803-5957-7
DOI :
10.1109/CCECE.2000.849537