Adversarial multi-armed bandit approach to two-person zero-sum Markov games

Author

Chang, Hyeong Soo ; Fu, Michael C. ; Marcus, Steven I.

Author_Institution

Sogang Univ., Seoul

fYear

2007

fDate

12-14 Dec. 2007

Firstpage

127

Lastpage

132

Abstract

A sampling-based algorithm for solving stochastic optimization problems, based on Auer et. al.\´s Exp3 algorithm for "adversarial multi-armed bandit problems," has been recently presented by the authors. In particular, the authors recursively extended the Exp3-based algorithm for solving finite-horizon Markov decision processes (MDPs) and analyzed its finite-iteration performance in terms of the expected bias relative to the maximum value of the "recursive sample- average-approximation (SAA)" problem induced by the sampling process in the algorithm, showing that the upper bound of the expected bias approaches zero as the sampling size per state sampled in each stage goes to infinity, leading to the convergence to the optimal value of the original MDP problem in the limit. As a sequel to the previous work, the idea is further extended for solving two-person zero-sum Markov games (MGs), providing a finite-iteration bound to the equilibrium value of the induced "recursive SAA game" problem and asymptotic convergence to the true equilibrium value. The recursively extended algorithm for MGs can be used for breaking the curse of dimensionality.

Keywords

Markov processes; convergence; game theory; optimisation; sampling methods; Exp3 algorithm; adversarial multiarmed bandit problem; asymptotic convergence; finite iteration performance; finite-horizon Markov decision process; recursive sample-average-approximation problem; sampling-based algorithm; state sampling; stochastic optimization problems; two-person zero-sum Markov games; Approximation algorithms; Convergence; Game theory; H infinity control; Performance analysis; Probability distribution; Sampling methods; State-space methods; Stochastic processes; Upper bound; Markov decision process; Markov game; sample average approximation; sampling;

fLanguage

English

Publisher

ieee

Conference_Titel

Decision and Control, 2007 46th IEEE Conference on

Conference_Location

New Orleans, LA

ISSN

0191-2216

Print_ISBN

978-1-4244-1497-0

Electronic_ISBN

0191-2216

Type

conf

DOI

10.1109/CDC.2007.4434044

Filename

4434044