DocumentCode :
2467929
Title :
Adversarial Multi-Armed Bandit Approach to Stochastic Optimization
Author :
Chang, Hyeong Soo ; Fu, Michael C. ; Marcus, Steven I.
Author_Institution :
Dept. of Comput. Sci. & Eng., Sogang Univ., Seoul
fYear :
2006
fDate :
13-15 Dec. 2006
Firstpage :
5681
Lastpage :
5686
Abstract :
We first present a sampling-based algorithm for solving stochastic optimization problems based on the Exp3 algorithm by Auer et al for "adversarial multi-armed bandit problems." The value returned by the algorithm is an upper-bound estimate of the sample-average-maximum for a sample average approximation (SAA) problem induced by the sampling process of the algorithm, and the estimate converges to the optimal objective-function value as the number of samples goes to infinity. We then recursively extend the Exp3-based algorithm for solving a given finite-horizon Markov decision process (MDP) and analyze its finite-time performance in terms of the expected bias relative to the maximum value of the induced recursive SAA problem, showing that the upper bound of the expected bias approaches zero as the sampling size per stage goes to infinity, leading to the convergence to the optimal value of the original MDP problem in the limit
Keywords :
Markov processes; decision theory; optimisation; sampling methods; Markov decision process; multiarmed bandit approach; sample average approximation; sampling-based algorithm; stochastic optimization; Algorithm design and analysis; Approximation algorithms; Computer science; Educational institutions; H infinity control; Probability distribution; Pursuit algorithms; Sampling methods; Stochastic processes; USA Councils;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Decision and Control, 2006 45th IEEE Conference on
Conference_Location :
San Diego, CA
Print_ISBN :
1-4244-0171-2
Type :
conf
DOI :
10.1109/CDC.2006.377724
Filename :
4177233
Link To Document :
بازگشت