Online Synchronous Approximate Optimal Learning Algorithm for Multi-Player Non-Zero-Sum Games With Unknown Dynamics

Author

Derong Liu ; Hongliang Li ; Ding Wang

Author_Institution

State Key Lab. of Manage. & Control for Complex Syst., Inst. of Autom., Beijing, China

Volume

44

Issue

8

fYear

2014

fDate

Aug. 2014

Firstpage

1015

Lastpage

1027

Abstract

In this paper, we develop an online synchronous approximate optimal learning algorithm based on policy iteration to solve a multiplayer nonzero-sum game without the requirement of exact knowledge of dynamical systems. First, we prove that the online policy iteration algorithm for the nonzero-sum game is mathematically equivalent to the quasi-Newton´s iteration in a Banach space. Then, a model neural network is established to identify the unknown continuous-time nonlinear system using input-output data. For each player, a critic neural network and an action neural network are used to approximate its value function and control policy, respectively. Our algorithm only needs to tune the weights of critic neural networks, so there will be less computational complexity during the learning process. All the neural network weights are updated online in real-time, continuously and synchronously. Furthermore, the uniform ultimate bounded stability of the closed-loop system is proved based on Lyapunov approach. Finally, two simulation examples are given to demonstrate the effectiveness of the developed scheme.

Keywords

Lyapunov methods; Newton method; closed loop systems; computational complexity; continuous time systems; function approximation; game theory; learning (artificial intelligence); mathematics computing; neural nets; nonlinear systems; stability; Banach space; Lyapunov approach; action neural network; closed-loop system; computational complexity; control policy; input-output data; model neural network; multiplayer nonzero-sum games; online policy iteration algorithm; online synchronous approximate optimal learning algorithm; quasiNewton iteration; uniform ultimate bounded stability; unknown continuous-time nonlinear system identification; unknown dynamics; value function approximation; Approximation algorithms; Dynamic programming; Equations; Games; Heuristic algorithms; Mathematical model; Nonlinear systems; Adaptive dynamic programming (ADP); approximate dynamic programming; multiplayer nonzero-sum games; neural networks; neuro-dynamic programming; policy iteration;

fLanguage

English

Journal_Title

Systems, Man, and Cybernetics: Systems, IEEE Transactions on

Publisher

ieee

ISSN

2168-2216

Type

jour

DOI

10.1109/TSMC.2013.2295351

Filename

6710226