Adaptive zero-sum stochastic game for two finite Markov chains

Author

Poznyak, A.S. ; Najim, K.

Author_Institution

Control Autom., CINVESTAV-IPN, Mexico City, Mexico

Volume

1

fYear

2000

fDate

2000

Firstpage

717

Abstract

A two finite Markov chains repeated zero-sum stochastic game with unknown transition matrices and payoffs is considered. The control objective is to obtain the equilibrium point based only on current measurements. The behavior of each players is modelled by a finite controlled Markov chain. A novel adaptive policy is developed based on Lagrange multipliers involved in a “learning through reinforcement” procedure. A regularized Lagrange function and a new normalization procedure are introduced. The saddle-point of this function is shown to be unique. The convergence properties are proved and the order of almost sure convergence is estimated as (n^-1/3)

Keywords

Lyapunov methods; Markov processes; convergence; matrix algebra; probability; stochastic games; adaptive policy; adaptive zero-sum stochastic game; control objective; convergence properties; equilibrium point; finite controlled Markov chain; normalization procedure; regularized Lagrange function; reinforcement learning; repeated game; saddle-point; Adaptive control; Automatic control; Convergence; Current measurement; Laboratories; Lagrangian functions; Process control; Programmable control; Recursive estimation; Stochastic processes;

fLanguage

English

Publisher

ieee

Conference_Titel

Decision and Control, 2000. Proceedings of the 39th IEEE Conference on

Conference_Location

Sydney, NSW

ISSN

0191-2216

Print_ISBN

0-7803-6638-7

Type

conf

DOI

10.1109/CDC.2000.912852

Filename

912852