DocumentCode :
2885731
Title :
On a restless multi-armed bandit problem with non-identical arms
Author :
Nayyar, Naumaan ; Gai, Yi ; Krishnamachari, Bhaskar
Author_Institution :
Ming Hsieh Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
fYear :
2011
fDate :
28-30 Sept. 2011
Firstpage :
369
Lastpage :
376
Abstract :
We consider the following learning problem motivated by opportunistic spectrum access in cognitive radio networks. There are N independent Gilbert-Elliott channels with possibly non-identical transition matrices. It is desired to have an online policy to maximize the long-term expected discounted reward from accessing one channel at each time dynamically. While there is a stream of recent results on this problem when the channels are identical, much less is known for the harder case of non-identical channels. We provide the first characterization of the structure of the optimal policy for this problem when the channels can be non-identical, in the Bayesian case (when the transition matrices are known). We also provide the first provably efficient learning algorithm for a non-Bayesian version of this problem (when the transition matrices are unknown). Specifically, for the special case of two positively correlated channels, we use the structure we identify to develop a novel mapping to a different multi-armed bandit with countably-infinite arms, in which each arm corresponds to a threshold-based policy. Using this mapping, we propose a policy that achieves near-logarithmic regret for this problem with respect to an e-optimal solution.
Keywords :
Bayes methods; cognitive radio; matrix algebra; radio networks; Bayesian case; Gilbert-Elliott channels; cognitive radio networks; countably-infinite arms; e-optimal solution; learning problem; long-term expected discounted reward maximization; near-logarithmic regret; nonidentical arms; nonidentical channels; nonidentical transition matrices; opportunistic spectrum access; restless multiarmed bandit problem; threshold-based policy; Bayesian methods; Indexes; Markov processes; Radiation detectors; Steady-state; Switches; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Communication, Control, and Computing (Allerton), 2011 49th Annual Allerton Conference on
Conference_Location :
Monticello, IL
Print_ISBN :
978-1-4577-1817-5
Type :
conf
DOI :
10.1109/Allerton.2011.6120191
Filename :
6120191
Link To Document :
بازگشت