Title :
Adversarial Multi-Armed Bandit Approach to Stochastic Optimization
Author :
Chang, Hyeong Soo ; Fu, Michael C. ; Marcus, Steven I.
Author_Institution :
Dept. of Comput. Sci. & Eng., Sogang Univ., Seoul
Abstract :
We first present a sampling-based algorithm for solving stochastic optimization problems based on the Exp3 algorithm by Auer et al for "adversarial multi-armed bandit problems." The value returned by the algorithm is an upper-bound estimate of the sample-average-maximum for a sample average approximation (SAA) problem induced by the sampling process of the algorithm, and the estimate converges to the optimal objective-function value as the number of samples goes to infinity. We then recursively extend the Exp3-based algorithm for solving a given finite-horizon Markov decision process (MDP) and analyze its finite-time performance in terms of the expected bias relative to the maximum value of the induced recursive SAA problem, showing that the upper bound of the expected bias approaches zero as the sampling size per stage goes to infinity, leading to the convergence to the optimal value of the original MDP problem in the limit
Keywords :
Markov processes; decision theory; optimisation; sampling methods; Markov decision process; multiarmed bandit approach; sample average approximation; sampling-based algorithm; stochastic optimization; Algorithm design and analysis; Approximation algorithms; Computer science; Educational institutions; H infinity control; Probability distribution; Pursuit algorithms; Sampling methods; Stochastic processes; USA Councils;
Conference_Titel :
Decision and Control, 2006 45th IEEE Conference on
Conference_Location :
San Diego, CA
Print_ISBN :
1-4244-0171-2
DOI :
10.1109/CDC.2006.377724