Title :
Performance of temporal differences and reinforcement learning in the cart-pole experiment
Author :
Geva, Shlomo ; Sitte, Joaquin
Author_Institution :
Fac. of Inf. Technol., Queensland Univ. of Technol., Brisbane, Qld., Australia
Abstract :
A comparison of the many proposed schemes for learning control tasks requires a standardised characterisation of their performance. Such characterisations are not available for most of the methods used in connection with neural networks. We undertook to characterise the performance of Barto´s et al. (1983) adaptive heuristic critic (AHC) method on the cart-pole balancing problem. We present criteria for training and control performance and use them to compare AHC controllers with the best PD controllers. It turns out that the learning dynamics of the AHC method for the cart-pole is apparently chaotic making performance assessment and parameter optimisation computationally very laborious.
Keywords :
adaptive control; intelligent control; learning (artificial intelligence); neurocontrollers; performance evaluation; robots; adaptive heuristic critic method; cart-pole balancing; intelligent control; learning control; neural networks; reinforcement learning; temporal differences; Australia; Bang-bang control; Chaos; Information technology; Learning; Neural networks; Optimization methods; PD control; Quantization; State-space methods;
Conference_Titel :
Neural Networks, 1993. IJCNN '93-Nagoya. Proceedings of 1993 International Joint Conference on
Print_ISBN :
0-7803-1421-2
DOI :
10.1109/IJCNN.1993.714314