• DocumentCode
    58712
  • Title

    Online Synchronous Approximate Optimal Learning Algorithm for Multi-Player Non-Zero-Sum Games With Unknown Dynamics

  • Author

    Derong Liu ; Hongliang Li ; Ding Wang

  • Author_Institution
    State Key Lab. of Manage. & Control for Complex Syst., Inst. of Autom., Beijing, China
  • Volume
    44
  • Issue
    8
  • fYear
    2014
  • fDate
    Aug. 2014
  • Firstpage
    1015
  • Lastpage
    1027
  • Abstract
    In this paper, we develop an online synchronous approximate optimal learning algorithm based on policy iteration to solve a multiplayer nonzero-sum game without the requirement of exact knowledge of dynamical systems. First, we prove that the online policy iteration algorithm for the nonzero-sum game is mathematically equivalent to the quasi-Newton´s iteration in a Banach space. Then, a model neural network is established to identify the unknown continuous-time nonlinear system using input-output data. For each player, a critic neural network and an action neural network are used to approximate its value function and control policy, respectively. Our algorithm only needs to tune the weights of critic neural networks, so there will be less computational complexity during the learning process. All the neural network weights are updated online in real-time, continuously and synchronously. Furthermore, the uniform ultimate bounded stability of the closed-loop system is proved based on Lyapunov approach. Finally, two simulation examples are given to demonstrate the effectiveness of the developed scheme.
  • Keywords
    Lyapunov methods; Newton method; closed loop systems; computational complexity; continuous time systems; function approximation; game theory; learning (artificial intelligence); mathematics computing; neural nets; nonlinear systems; stability; Banach space; Lyapunov approach; action neural network; closed-loop system; computational complexity; control policy; input-output data; model neural network; multiplayer nonzero-sum games; online policy iteration algorithm; online synchronous approximate optimal learning algorithm; quasiNewton iteration; uniform ultimate bounded stability; unknown continuous-time nonlinear system identification; unknown dynamics; value function approximation; Approximation algorithms; Dynamic programming; Equations; Games; Heuristic algorithms; Mathematical model; Nonlinear systems; Adaptive dynamic programming (ADP); approximate dynamic programming; multiplayer nonzero-sum games; neural networks; neuro-dynamic programming; policy iteration;
  • fLanguage
    English
  • Journal_Title
    Systems, Man, and Cybernetics: Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    2168-2216
  • Type

    jour

  • DOI
    10.1109/TSMC.2013.2295351
  • Filename
    6710226