• DocumentCode
    2467929
  • Title

    Adversarial Multi-Armed Bandit Approach to Stochastic Optimization

  • Author

    Chang, Hyeong Soo ; Fu, Michael C. ; Marcus, Steven I.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Sogang Univ., Seoul
  • fYear
    2006
  • fDate
    13-15 Dec. 2006
  • Firstpage
    5681
  • Lastpage
    5686
  • Abstract
    We first present a sampling-based algorithm for solving stochastic optimization problems based on the Exp3 algorithm by Auer et al for "adversarial multi-armed bandit problems." The value returned by the algorithm is an upper-bound estimate of the sample-average-maximum for a sample average approximation (SAA) problem induced by the sampling process of the algorithm, and the estimate converges to the optimal objective-function value as the number of samples goes to infinity. We then recursively extend the Exp3-based algorithm for solving a given finite-horizon Markov decision process (MDP) and analyze its finite-time performance in terms of the expected bias relative to the maximum value of the induced recursive SAA problem, showing that the upper bound of the expected bias approaches zero as the sampling size per stage goes to infinity, leading to the convergence to the optimal value of the original MDP problem in the limit
  • Keywords
    Markov processes; decision theory; optimisation; sampling methods; Markov decision process; multiarmed bandit approach; sample average approximation; sampling-based algorithm; stochastic optimization; Algorithm design and analysis; Approximation algorithms; Computer science; Educational institutions; H infinity control; Probability distribution; Pursuit algorithms; Sampling methods; Stochastic processes; USA Councils;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Decision and Control, 2006 45th IEEE Conference on
  • Conference_Location
    San Diego, CA
  • Print_ISBN
    1-4244-0171-2
  • Type

    conf

  • DOI
    10.1109/CDC.2006.377724
  • Filename
    4177233