Residual advantage learning applied to a differential game

Author

Harmon, Mance E. ; III, L.C.B.

Volume

1

fYear

1996

fDate

3-6 Jun 1996

Firstpage

329

Abstract

An application of reinforcement learning to a differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual form of advantage learning. The game is a Markov decision process (MDP) with continuous states and nonlinear dynamics. The game consists of two players, a missile and a plane; the missile pursues the plane and the plane evades the missile. On each time step each player chooses one of two possible actions; turn left or rum right 90 degrees. Reinforcement is given only when the missile hits the plane or the plane reaches an escape distance from the missile. The advantage function is stored in a single-hidden-layer sigmoidal network. The reinforcement learning algorithm for optimal control is modified for differential games in order to find the minimax point, rather than the maximum. As far as we know, this is the first time that a reinforcement learning algorithm with guaranteed convergence for general function approximation systems has been demonstrated to work with a general neural network

Keywords

convergence; differential games; function approximation; learning (artificial intelligence); neural nets; optimal control; Markov decision process; continuous states; differential game; general function approximation systems; guaranteed convergence; minimax point; nonlinear dynamics; optimal control; reinforcement learning; residual advantage learning; single-hidden-layer sigmoidal network; Aerospace electronics; Approximation algorithms; Convergence; Function approximation; Learning; Minimax techniques; Missiles; Neural networks; Optimal control; State estimation;

fLanguage

English

Publisher

ieee

Conference_Titel

Neural Networks, 1996., IEEE International Conference on

Conference_Location

Washington, DC

Print_ISBN

0-7803-3210-5

Type

conf

DOI

10.1109/ICNN.1996.548913

Filename

548913