DocumentCode
714186
Title
An estimation based allocation rule with super-linear regret and finite lock-on time for time-dependent multi-armed bandit processes
Author
Prokopiou, Prokopis C. ; Caines, Peter E. ; Mahajan, Aditya
Author_Institution
Integrated Program in Neurosci., McGill Univ., Montreal, QC, Canada
fYear
2015
fDate
3-6 May 2015
Firstpage
1299
Lastpage
1306
Abstract
The multi-armed bandit (MAB) problem has been an active area of research since the early 1930s. The majority of the literature restricts attention to i.i.d. or Markov reward processes. In this paper, the finite-parameter MAB problem with time-dependent reward processes is investigated. An upper confidence bound (UCB) based index policy, where the index is computed based on the maximum-likelihood estimate of the unknown parameter, is proposed. This policy locks on to the optimal arm in finite expected time but has a super-linear regret. As an example, the proposed index policy is used for minimizing prediction error when each arm is a auto-regressive moving average (ARMA) process.
Keywords
Markov processes; autoregressive moving average processes; maximum likelihood estimation; resource allocation; ARMA process; Markov reward process; UCB based index policy; auto-regressive moving average process; estimation based allocation rule; finite expected time; finite lock-on time; finite-parameter MAB problem; maximum-likelihood estimation; prediction error minimizaion; superlinear regret; time-dependent multiarmed bandit process; time-dependent reward process; upper confidence bound based index policy; Computers; Indexes; Markov processes; Maximum likelihood estimation; Resource management; Technological innovation;
fLanguage
English
Publisher
ieee
Conference_Titel
Electrical and Computer Engineering (CCECE), 2015 IEEE 28th Canadian Conference on
Conference_Location
Halifax, NS
ISSN
0840-7789
Print_ISBN
978-1-4799-5827-6
Type
conf
DOI
10.1109/CCECE.2015.7129466
Filename
7129466
Link To Document