مرکز منطقه ای اطلاع رساني علوم و فناوري - Time-varying stochastic multi-armed bandit problems

DocumentCode :

3588075

Title :

Time-varying stochastic multi-armed bandit problems

Author :

Vakili, Sattar ; Qing Zhao ; Yuan Zhou

Author_Institution :

Dept. of Electr. & Comput. Eng., Univ. of California, Davis, Davis, CA, USA

fYear :

2014

Firstpage :

2103

Lastpage :

2107

Abstract :

In this paper, we consider a time-varying stochastic multi-armed bandit (MAB) problem where the unknown reward distribution of each arm can change arbitrarily over time. We obtain a lower bound on the regret order and demonstrate that an online learning algorithm achieves this lower bound. We further consider a piece-wise stationary model of the arm reward distributions and establish the regret performance of an online learning algorithm in terms of the number of change points experienced by the reward distributions over the time horizon.

Keywords :

game theory; learning (artificial intelligence); stochastic processes; MAB problem; arm reward distribution; online learning algorithm; piece-wise stationary model; time-varying stochastic multiarmed bandit problems; unknown reward distribution; Computers; Current measurement; Length measurement; Partitioning algorithms; Stochastic processes; Switches; Time measurement;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Signals, Systems and Computers, 2014 48th Asilomar Conference on

Print_ISBN :

978-1-4799-8295-0

Type :

conf

DOI :

10.1109/ACSSC.2014.7094845

Filename :

7094845

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3588075