• DocumentCode
    2575310
  • Title

    Online solution of nonlinear two-player zero-sum games using synchronous policy iteration

  • Author

    Vamvoudakis, Kyriakos G. ; Lewis, F.L.

  • Author_Institution
    Autom. & Robot. Res. Inst., Univ. of Texas at Arlington, Fort Worth, TX, USA
  • fYear
    2010
  • fDate
    15-17 Dec. 2010
  • Firstpage
    3040
  • Lastpage
    3047
  • Abstract
    In this paper we present an online gaming algorithm based on policy iteration to solve the continuous-time (CT) two-player zero-sum game with infinite horizon cost for nonlinear systems with known dynamics. That is, the algorithm learns online in real-time the solution to the game design HJI equation. This method finds in real-time suitable approximations of the optimal value, and the saddle point control policy and disturbance policy, while also guaranteeing closed-loop stability. The adaptive algorithm is implemented as an actor/critic structure which involves simultaneous continuous-time adaptation of critic, control actor, and disturbance neural networks. We call this online gaming algorithm `synchronous´ zero-sum game policy iteration. A persistence of excitation condition is shown to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for critic, actor and disturbance networks. The convergence to the optimal saddle point solution is proven, and stability of the system is also guaranteed. Simulation examples show the effectiveness of the new algorithm.
  • Keywords
    adaptive control; closed loop systems; continuous time systems; control system synthesis; game theory; infinite horizon; neurocontrollers; nonlinear control systems; optimal control; stability; adaptive algorithm; closed-loop stability; continuous-time adaptation; continuous-time two-player zero-sum game; control actor; disturbance network; disturbance neural network; disturbance policy; excitation condition; game design HJI equation; infinite horizon cost; nonlinear system; nonlinear two-player zero-sum game; online gaming algorithm; optimal saddle point solution; saddle point control policy; synchronous policy iteration; synchronous zero-sum game policy iteration; tuning algorithm; Approximation algorithms; Artificial neural networks; Convergence; Equations; Function approximation; Games; Approximate Dynamic Programming; H-infinity; Hamilton-Jacobi-Isaacs equation; Nash-equilibrium; Persistence of Excitation; Policy Iteration; Synchronous Zero-Sum Game Policy Iteration;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Decision and Control (CDC), 2010 49th IEEE Conference on
  • Conference_Location
    Atlanta, GA
  • ISSN
    0743-1546
  • Print_ISBN
    978-1-4244-7745-6
  • Type

    conf

  • DOI
    10.1109/CDC.2010.5717607
  • Filename
    5717607