• DocumentCode
    2814828
  • Title

    Adversarial multi-armed bandit approach to two-person zero-sum Markov games

  • Author

    Chang, Hyeong Soo ; Fu, Michael C. ; Marcus, Steven I.

  • Author_Institution
    Sogang Univ., Seoul
  • fYear
    2007
  • fDate
    12-14 Dec. 2007
  • Firstpage
    127
  • Lastpage
    132
  • Abstract
    A sampling-based algorithm for solving stochastic optimization problems, based on Auer et. al.\´s Exp3 algorithm for "adversarial multi-armed bandit problems," has been recently presented by the authors. In particular, the authors recursively extended the Exp3-based algorithm for solving finite-horizon Markov decision processes (MDPs) and analyzed its finite-iteration performance in terms of the expected bias relative to the maximum value of the "recursive sample- average-approximation (SAA)" problem induced by the sampling process in the algorithm, showing that the upper bound of the expected bias approaches zero as the sampling size per state sampled in each stage goes to infinity, leading to the convergence to the optimal value of the original MDP problem in the limit. As a sequel to the previous work, the idea is further extended for solving two-person zero-sum Markov games (MGs), providing a finite-iteration bound to the equilibrium value of the induced "recursive SAA game" problem and asymptotic convergence to the true equilibrium value. The recursively extended algorithm for MGs can be used for breaking the curse of dimensionality.
  • Keywords
    Markov processes; convergence; game theory; optimisation; sampling methods; Exp3 algorithm; adversarial multiarmed bandit problem; asymptotic convergence; finite iteration performance; finite-horizon Markov decision process; recursive sample-average-approximation problem; sampling-based algorithm; state sampling; stochastic optimization problems; two-person zero-sum Markov games; Approximation algorithms; Convergence; Game theory; H infinity control; Performance analysis; Probability distribution; Sampling methods; State-space methods; Stochastic processes; Upper bound; Markov decision process; Markov game; sample average approximation; sampling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Decision and Control, 2007 46th IEEE Conference on
  • Conference_Location
    New Orleans, LA
  • ISSN
    0191-2216
  • Print_ISBN
    978-1-4244-1497-0
  • Electronic_ISBN
    0191-2216
  • Type

    conf

  • DOI
    10.1109/CDC.2007.4434044
  • Filename
    4434044