مرکز منطقه ای اطلاع رساني علوم و فناوري - On a restless multi-armed bandit problem with non-identical arms

DocumentCode :

2885731

Title :

On a restless multi-armed bandit problem with non-identical arms

Author :

Nayyar, Naumaan ; Gai, Yi ; Krishnamachari, Bhaskar

Author_Institution :

Ming Hsieh Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA

fYear :

2011

fDate :

28-30 Sept. 2011

Firstpage :

369

Lastpage :

376

Abstract :

We consider the following learning problem motivated by opportunistic spectrum access in cognitive radio networks. There are N independent Gilbert-Elliott channels with possibly non-identical transition matrices. It is desired to have an online policy to maximize the long-term expected discounted reward from accessing one channel at each time dynamically. While there is a stream of recent results on this problem when the channels are identical, much less is known for the harder case of non-identical channels. We provide the first characterization of the structure of the optimal policy for this problem when the channels can be non-identical, in the Bayesian case (when the transition matrices are known). We also provide the first provably efficient learning algorithm for a non-Bayesian version of this problem (when the transition matrices are unknown). Specifically, for the special case of two positively correlated channels, we use the structure we identify to develop a novel mapping to a different multi-armed bandit with countably-infinite arms, in which each arm corresponds to a threshold-based policy. Using this mapping, we propose a policy that achieves near-logarithmic regret for this problem with respect to an e-optimal solution.

Keywords :

Bayes methods; cognitive radio; matrix algebra; radio networks; Bayesian case; Gilbert-Elliott channels; cognitive radio networks; countably-infinite arms; e-optimal solution; learning problem; long-term expected discounted reward maximization; near-logarithmic regret; nonidentical arms; nonidentical channels; nonidentical transition matrices; opportunistic spectrum access; restless multiarmed bandit problem; threshold-based policy; Bayesian methods; Indexes; Markov processes; Radiation detectors; Steady-state; Switches; Vectors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Communication, Control, and Computing (Allerton), 2011 49th Annual Allerton Conference on

Conference_Location :

Monticello, IL

Print_ISBN :

978-1-4577-1817-5

Type :

conf

DOI :

10.1109/Allerton.2011.6120191

Filename :

6120191

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2885731