Title : 
Three connectionist implementations of dynamic programming for optimal control: a preliminary comparative analysis
         
        
            Author : 
Bersini, Hugues ; Gorrini, Vittorio
         
        
            Author_Institution : 
IRIDIA, Univ. Libre de Bruxelles, Belgium
         
        
        
        
        
            Abstract : 
Three optimal control methodologies all relying on neural network for their universal approximation capabilities and on dynamic programming for substituting the time-integral optimization by a succession of time-local optimizations are presented in this paper and applied on the same elementary rendezvous problem. First a simplified version of the backpropagation-through-time algorithm is presented as the most faithful implementation of dynamic programming when the optimal controller is approximated by a neural network (learning by gradient descent) and the process model is available. Relaxing the need for an explicit prior modelling of the process model, reinforcement learning (RL) approaches, both for continuous and discrete controllers, are described and tested on the rendezvous problem. The results and the numerous methodological difficulties we met are discussed. The most successful reinforcement learning is the connectionist implementation of Q-learning with all Q-values approximated by radial-basis-function networks. However when searching for a continuous optimal controller, the price RL has to pay for the absence of model turns out to be far from negligible in terms of methodological difficulties, lack of robustness, convergence time and quality of the discovered solution
         
        
            Keywords : 
dynamic programming; learning (artificial intelligence); optimal control; robust control; Q-learning; backpropagation-through-time algorithm; connectionist implementations; dynamic programming; elementary rendezvous problem; gradient descent; optimal control; radial-basis-function networks; reinforcement learning; time-local optimizations; universal approximation capabilities; Backpropagation algorithms; Cost function; Delay effects; Dynamic programming; Jacobian matrices; Lagrangian functions; Learning; Neural networks; Optimal control; Optimization methods;
         
        
        
        
            Conference_Titel : 
Neural Networks for Identification, Control, Robotics, and Signal/Image Processing, 1996. Proceedings., International Workshop on
         
        
            Conference_Location : 
Venice
         
        
            Print_ISBN : 
0-8186-7456-3
         
        
        
            DOI : 
10.1109/NICRSP.1996.542787