Title :
Reinforcement control via action dependent heuristic dynamic programming
Author :
Tang, K. Wendy ; Srikant, Govardhan
Author_Institution :
Dept. of Electr. Eng., State Univ. of New York, Stony Brook, NY, USA
Abstract :
Heuristic dynamic programming (HDP) is the simplest kind of adaptive critic which is a powerful form of reinforcement control. It can be used to maximize or minimize any utility function, such as total energy or trajectory error, of a system over time in a noisy environment. Unlike supervised learning, adaptive critic design does not require the desired control signals be known. Instead, feedback is obtained based on a critic network which learns the relationship between a set of control signals and the corresponding strategic utility function. It is an approximation of dynamic programming. Action-dependent heuristic dynamic programming (ADHDP) system involves two subnetworks, the action network and the critic network. Each of these networks includes a feedforward and a feedback component. A flow chart for the interaction of these components is included. To further illustrate the algorithm, we use ADHDP for the control of a simple, 2D planar robot
Keywords :
dynamic programming; feedforward neural nets; heuristic programming; learning (artificial intelligence); neurocontrollers; optimal control; recurrent neural nets; 2D planar robot; ADHDP; HDP; action network; action-dependent heuristic dynamic programming; adaptive critic; critic network; feedback; feedforward component; flow chart; reinforcement control; strategic utility function; total energy; trajectory error; utility function maximization; utility function minimization; Adaptive control; Backpropagation; Control systems; Dynamic programming; Feedback; Neurofeedback; Programmable control; Robots; Signal design; Supervised learning;
Conference_Titel :
Neural Networks,1997., International Conference on
Conference_Location :
Houston, TX
Print_ISBN :
0-7803-4122-8
DOI :
10.1109/ICNN.1997.614163