Title :
Time-varying stochastic multi-armed bandit problems
Author :
Vakili, Sattar ; Qing Zhao ; Yuan Zhou
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of California, Davis, Davis, CA, USA
Abstract :
In this paper, we consider a time-varying stochastic multi-armed bandit (MAB) problem where the unknown reward distribution of each arm can change arbitrarily over time. We obtain a lower bound on the regret order and demonstrate that an online learning algorithm achieves this lower bound. We further consider a piece-wise stationary model of the arm reward distributions and establish the regret performance of an online learning algorithm in terms of the number of change points experienced by the reward distributions over the time horizon.
Keywords :
game theory; learning (artificial intelligence); stochastic processes; MAB problem; arm reward distribution; online learning algorithm; piece-wise stationary model; time-varying stochastic multiarmed bandit problems; unknown reward distribution; Computers; Current measurement; Length measurement; Partitioning algorithms; Stochastic processes; Switches; Time measurement;
Conference_Titel :
Signals, Systems and Computers, 2014 48th Asilomar Conference on
Print_ISBN :
978-1-4799-8295-0
DOI :
10.1109/ACSSC.2014.7094845