DocumentCode
2575310
Title
Online solution of nonlinear two-player zero-sum games using synchronous policy iteration
Author
Vamvoudakis, Kyriakos G. ; Lewis, F.L.
Author_Institution
Autom. & Robot. Res. Inst., Univ. of Texas at Arlington, Fort Worth, TX, USA
fYear
2010
fDate
15-17 Dec. 2010
Firstpage
3040
Lastpage
3047
Abstract
In this paper we present an online gaming algorithm based on policy iteration to solve the continuous-time (CT) two-player zero-sum game with infinite horizon cost for nonlinear systems with known dynamics. That is, the algorithm learns online in real-time the solution to the game design HJI equation. This method finds in real-time suitable approximations of the optimal value, and the saddle point control policy and disturbance policy, while also guaranteeing closed-loop stability. The adaptive algorithm is implemented as an actor/critic structure which involves simultaneous continuous-time adaptation of critic, control actor, and disturbance neural networks. We call this online gaming algorithm `synchronous´ zero-sum game policy iteration. A persistence of excitation condition is shown to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for critic, actor and disturbance networks. The convergence to the optimal saddle point solution is proven, and stability of the system is also guaranteed. Simulation examples show the effectiveness of the new algorithm.
Keywords
adaptive control; closed loop systems; continuous time systems; control system synthesis; game theory; infinite horizon; neurocontrollers; nonlinear control systems; optimal control; stability; adaptive algorithm; closed-loop stability; continuous-time adaptation; continuous-time two-player zero-sum game; control actor; disturbance network; disturbance neural network; disturbance policy; excitation condition; game design HJI equation; infinite horizon cost; nonlinear system; nonlinear two-player zero-sum game; online gaming algorithm; optimal saddle point solution; saddle point control policy; synchronous policy iteration; synchronous zero-sum game policy iteration; tuning algorithm; Approximation algorithms; Artificial neural networks; Convergence; Equations; Function approximation; Games; Approximate Dynamic Programming; H-infinity; Hamilton-Jacobi-Isaacs equation; Nash-equilibrium; Persistence of Excitation; Policy Iteration; Synchronous Zero-Sum Game Policy Iteration;
fLanguage
English
Publisher
ieee
Conference_Titel
Decision and Control (CDC), 2010 49th IEEE Conference on
Conference_Location
Atlanta, GA
ISSN
0743-1546
Print_ISBN
978-1-4244-7745-6
Type
conf
DOI
10.1109/CDC.2010.5717607
Filename
5717607
Link To Document