DocumentCode
2467929
Title
Adversarial Multi-Armed Bandit Approach to Stochastic Optimization
Author
Chang, Hyeong Soo ; Fu, Michael C. ; Marcus, Steven I.
Author_Institution
Dept. of Comput. Sci. & Eng., Sogang Univ., Seoul
fYear
2006
fDate
13-15 Dec. 2006
Firstpage
5681
Lastpage
5686
Abstract
We first present a sampling-based algorithm for solving stochastic optimization problems based on the Exp3 algorithm by Auer et al for "adversarial multi-armed bandit problems." The value returned by the algorithm is an upper-bound estimate of the sample-average-maximum for a sample average approximation (SAA) problem induced by the sampling process of the algorithm, and the estimate converges to the optimal objective-function value as the number of samples goes to infinity. We then recursively extend the Exp3-based algorithm for solving a given finite-horizon Markov decision process (MDP) and analyze its finite-time performance in terms of the expected bias relative to the maximum value of the induced recursive SAA problem, showing that the upper bound of the expected bias approaches zero as the sampling size per stage goes to infinity, leading to the convergence to the optimal value of the original MDP problem in the limit
Keywords
Markov processes; decision theory; optimisation; sampling methods; Markov decision process; multiarmed bandit approach; sample average approximation; sampling-based algorithm; stochastic optimization; Algorithm design and analysis; Approximation algorithms; Computer science; Educational institutions; H infinity control; Probability distribution; Pursuit algorithms; Sampling methods; Stochastic processes; USA Councils;
fLanguage
English
Publisher
ieee
Conference_Titel
Decision and Control, 2006 45th IEEE Conference on
Conference_Location
San Diego, CA
Print_ISBN
1-4244-0171-2
Type
conf
DOI
10.1109/CDC.2006.377724
Filename
4177233
Link To Document