DocumentCode
2887180
Title
Adaptive learning of uncontrolled restless bandits with logarithmic regret
Author
Tekin, Cem ; Liu, Mingyan
Author_Institution
Dept. of Electr. Eng. & Comput. Sci., Univ. of Michigan, Ann Arbor, MI, USA
fYear
2011
fDate
28-30 Sept. 2011
Firstpage
983
Lastpage
990
Abstract
In this paper we consider the problem of learning the optimal policy for the uncontrolled restless bandit problem. In this problem only the state of the selected arm can be observed, the state transitions are independent of control and the transition law is unknown. We propose a learning algorithm which gives logarithmic regret uniformly over time with respect to the optimal finite horizon policy with known transition law under some assumptions on the transition probabilities of the arms and the structure of the optimal stationary policy for the infinite horizon average reward problem.
Keywords
Markov processes; computational complexity; learning (artificial intelligence); adaptive learning algorithm; infinite horizon average reward problem; logarithmic regret; optimal finite horizon policy; optimal stationary policy; state transition; transition law; transition probability; uncontrolled restless bandit problem; Equations; History; Indexes; Markov processes; Mathematical model; Upper bound; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Communication, Control, and Computing (Allerton), 2011 49th Annual Allerton Conference on
Conference_Location
Monticello, IL
Print_ISBN
978-1-4577-1817-5
Type
conf
DOI
10.1109/Allerton.2011.6120273
Filename
6120273
Link To Document