Title : 
Adaptive zero-sum stochastic game for two finite Markov chains
         
        
            Author : 
Poznyak, A.S. ; Najim, K.
         
        
            Author_Institution : 
Control Autom., CINVESTAV-IPN, Mexico City, Mexico
         
        
        
        
        
        
            Abstract : 
A two finite Markov chains repeated zero-sum stochastic game with unknown transition matrices and payoffs is considered. The control objective is to obtain the equilibrium point based only on current measurements. The behavior of each players is modelled by a finite controlled Markov chain. A novel adaptive policy is developed based on Lagrange multipliers involved in a “learning through reinforcement” procedure. A regularized Lagrange function and a new normalization procedure are introduced. The saddle-point of this function is shown to be unique. The convergence properties are proved and the order of almost sure convergence is estimated as (n-1/3 )
         
        
            Keywords : 
Lyapunov methods; Markov processes; convergence; matrix algebra; probability; stochastic games; adaptive policy; adaptive zero-sum stochastic game; control objective; convergence properties; equilibrium point; finite controlled Markov chain; normalization procedure; regularized Lagrange function; reinforcement learning; repeated game; saddle-point; Adaptive control; Automatic control; Convergence; Current measurement; Laboratories; Lagrangian functions; Process control; Programmable control; Recursive estimation; Stochastic processes;
         
        
        
        
            Conference_Titel : 
Decision and Control, 2000. Proceedings of the 39th IEEE Conference on
         
        
            Conference_Location : 
Sydney, NSW
         
        
        
            Print_ISBN : 
0-7803-6638-7
         
        
        
            DOI : 
10.1109/CDC.2000.912852