Reinforcement control via action dependent heuristic dynamic programming

Author

Tang, K. Wendy ; Srikant, Govardhan

Author_Institution

Dept. of Electr. Eng., State Univ. of New York, Stony Brook, NY, USA

Volume

3

fYear

1997

fDate

9-12 Jun 1997

Firstpage

1766

Abstract

Heuristic dynamic programming (HDP) is the simplest kind of adaptive critic which is a powerful form of reinforcement control. It can be used to maximize or minimize any utility function, such as total energy or trajectory error, of a system over time in a noisy environment. Unlike supervised learning, adaptive critic design does not require the desired control signals be known. Instead, feedback is obtained based on a critic network which learns the relationship between a set of control signals and the corresponding strategic utility function. It is an approximation of dynamic programming. Action-dependent heuristic dynamic programming (ADHDP) system involves two subnetworks, the action network and the critic network. Each of these networks includes a feedforward and a feedback component. A flow chart for the interaction of these components is included. To further illustrate the algorithm, we use ADHDP for the control of a simple, 2D planar robot

Keywords

dynamic programming; feedforward neural nets; heuristic programming; learning (artificial intelligence); neurocontrollers; optimal control; recurrent neural nets; 2D planar robot; ADHDP; HDP; action network; action-dependent heuristic dynamic programming; adaptive critic; critic network; feedback; feedforward component; flow chart; reinforcement control; strategic utility function; total energy; trajectory error; utility function maximization; utility function minimization; Adaptive control; Backpropagation; Control systems; Dynamic programming; Feedback; Neurofeedback; Programmable control; Robots; Signal design; Supervised learning;

fLanguage

English

Publisher

ieee

Conference_Titel

Neural Networks,1997., International Conference on

Conference_Location

Houston, TX

Print_ISBN

0-7803-4122-8

Type

conf

DOI

10.1109/ICNN.1997.614163

Filename

614163