DocumentCode
66010
Title
Learning in Mean-Field Games
Author
Huibing Yin ; Mehta, Prashant G. ; Meyn, Sean P. ; Shanbhag, Uday V.
Author_Institution
Dept. of Mech. Sci. & Eng., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
Volume
59
Issue
3
fYear
2014
fDate
Mar-14
Firstpage
629
Lastpage
644
Abstract
The purpose of this paper is to show how insight obtained from a mean-field model can be used to create an architecture for approximate dynamic programming (ADP) for a certain class of games comprising of a large number of agents. The general technique is illustrated with the aid of a mean-field oscillator game model introduced in our prior work. The states of the model are interpreted as the phase angles for a collection of nonhomogeneous oscillators, and in this way the model may be regarded as an extension of the classical coupled oscillator model of Kuramoto. The paper introduces ADP techniques for design and adaptation (learning) of approximately optimal control laws for this model. For this purpose, a parameterization is proposed, based on an analysis of the mean-field PDE model for the game. In an offline setting, a Galerkin procedure is introduced to choose the optimal parameters while in an online setting, a steepest descent algorithm is proposed. The paper provides a detailed analysis of the optimal parameter values as well as the Bellman error with both the Galerkin approximation and the online algorithm. Finally, a phase transition result is described for the large population limit when each oscillator uses the approximately optimal control law. A critical value of the control penalty parameter is identified: above this value, the oscillators are incoherent; and below this value (when control is sufficiently cheap) the oscillators synchronize. These conclusions are illustrated with results from numerical experiments.
Keywords
Galerkin method; approximation theory; dynamic programming; game theory; gradient methods; learning (artificial intelligence); partial differential equations; ADP techniques; Bellman error; Galerkin approximation; approximate dynamic programming; control penalty parameter; mean-field PDE model analysis; mean-field model game model; nonhomogeneous oscillators; online algorithm; optimal control laws; optimal parameter selection; partial differential equation; phase transition; steepest descent algorithm; stochastic learning; Approximation methods; Equations; Games; Mathematical model; Oscillators; Sociology; Statistics; Mean-field game; nonlinear systems; phase transition; stochastic learning; synchronization;
fLanguage
English
Journal_Title
Automatic Control, IEEE Transactions on
Publisher
ieee
ISSN
0018-9286
Type
jour
DOI
10.1109/TAC.2013.2287733
Filename
6646268
Link To Document