Title :
Maximal expectation as upper confidence bound for multi-armed bandit problems
Author :
Kuo-Yuan Kao ; I-Hao Chen
Author_Institution :
Dept. of Inf. Manage., Penghu Univ. of Sci. & Technol., Penghu, Taiwan
Abstract :
State of the art algorithms for stochastic multi-armed bandit problem are based on either one of the two principles: optimism in face of uncertainty, and probability matching. In this paper, we provide a unified approach to combine these principles. The major result is a new upper confidence bound formula, UCB-max, by which the p-value of an arm´s UCB value roughly matches the arm´s probability of been optimal. Our numerical study comparing UCB-max with other competitors (UCB-tuned, UCB-V, MOSS, KL-UCB and Thompson sampling) shows that UCB-max is remarkably efficient and stable.
Keywords :
probability; stochastic processes; KL-UCB; MOSS; Thompson sampling; UCB-V; UCB-max; UCB-tuned method; arm UCB value; arm probability; maximal expectation; optimism principles; p-value; probability matching; stochastic multiarmed bandit problem; unified approach; upper confidence bound; upper confidence bound formula; Algorithm design and analysis; Classification algorithms; Indexes; Manganese; Random variables; Stochastic processes; bandit problem; machine learning; online learning; probability matching; upper confidence bound;
Conference_Titel :
Information Technology and Artificial Intelligence Conference (ITAIC), 2014 IEEE 7th Joint International
Print_ISBN :
978-1-4799-4420-0
DOI :
10.1109/ITAIC.2014.7065060