DocumentCode :
3588075
Title :
Time-varying stochastic multi-armed bandit problems
Author :
Vakili, Sattar ; Qing Zhao ; Yuan Zhou
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of California, Davis, Davis, CA, USA
fYear :
2014
Firstpage :
2103
Lastpage :
2107
Abstract :
In this paper, we consider a time-varying stochastic multi-armed bandit (MAB) problem where the unknown reward distribution of each arm can change arbitrarily over time. We obtain a lower bound on the regret order and demonstrate that an online learning algorithm achieves this lower bound. We further consider a piece-wise stationary model of the arm reward distributions and establish the regret performance of an online learning algorithm in terms of the number of change points experienced by the reward distributions over the time horizon.
Keywords :
game theory; learning (artificial intelligence); stochastic processes; MAB problem; arm reward distribution; online learning algorithm; piece-wise stationary model; time-varying stochastic multiarmed bandit problems; unknown reward distribution; Computers; Current measurement; Length measurement; Partitioning algorithms; Stochastic processes; Switches; Time measurement;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signals, Systems and Computers, 2014 48th Asilomar Conference on
Print_ISBN :
978-1-4799-8295-0
Type :
conf
DOI :
10.1109/ACSSC.2014.7094845
Filename :
7094845
Link To Document :
بازگشت