مرکز منطقه ای اطلاع رساني علوم و فناوري - Maximal expectation as upper confidence bound for multi-armed bandit problems

DocumentCode :

3580372

Title :

Maximal expectation as upper confidence bound for multi-armed bandit problems

Author :

Kuo-Yuan Kao ; I-Hao Chen

Author_Institution :

Dept. of Inf. Manage., Penghu Univ. of Sci. & Technol., Penghu, Taiwan

fYear :

2014

Firstpage :

325

Lastpage :

329

Abstract :

State of the art algorithms for stochastic multi-armed bandit problem are based on either one of the two principles: optimism in face of uncertainty, and probability matching. In this paper, we provide a unified approach to combine these principles. The major result is a new upper confidence bound formula, UCB-max, by which the p-value of an arm´s UCB value roughly matches the arm´s probability of been optimal. Our numerical study comparing UCB-max with other competitors (UCB-tuned, UCB-V, MOSS, KL-UCB and Thompson sampling) shows that UCB-max is remarkably efficient and stable.

Keywords :

probability; stochastic processes; KL-UCB; MOSS; Thompson sampling; UCB-V; UCB-max; UCB-tuned method; arm UCB value; arm probability; maximal expectation; optimism principles; p-value; probability matching; stochastic multiarmed bandit problem; unified approach; upper confidence bound; upper confidence bound formula; Algorithm design and analysis; Classification algorithms; Indexes; Manganese; Random variables; Stochastic processes; bandit problem; machine learning; online learning; probability matching; upper confidence bound;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Information Technology and Artificial Intelligence Conference (ITAIC), 2014 IEEE 7th Joint International

Print_ISBN :

978-1-4799-4420-0

Type :

conf

DOI :

10.1109/ITAIC.2014.7065060

Filename :

7065060

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3580372