• DocumentCode
    1390102
  • Title

    Opportunistic spectrum access based on a constrained multi-armed bandit formulation

  • Author

    Ai, Jing ; Abouzeid, Alhussein A.

  • Author_Institution
    Juniper Networks Inc., Sunnyvale, CA, 94089
  • Volume
    11
  • Issue
    2
  • fYear
    2009
  • fDate
    4/1/2009 12:00:00 AM
  • Firstpage
    134
  • Lastpage
    147
  • Abstract
    Tracking and exploiting instantaneous spectrum opportunities are fundamental challenges in opportunistic spectrum access (OSA) in presence of the bursty traffic of primary users and the limited spectrum sensing capability of secondary users. In order to take advantage of the history of spectrum sensing and access decisions, a sequential decision framework is widely used to design optimal policies. However, many existing schemes, based on a partially observed Markov decision process (POMDP) framework, reveal that optimal policies are non-stationary in nature which renders them difficult to calculate and implement. Therefore, this work pursues stationary OSA policies, which are thereby efficient yet low-complexity, while still incorporating many practical factors, such as spectrum sensing errors and a priori unknown statistical spectrum knowledge. First, with an approximation on channel evolution, OSA is formulated in a multi-armed bandit (MAB) framework. As a result, the optimal policy is specified by the well-known Gittins index rule, where the channel with the largest Gittins index is always selected. Then, closed-form formulas are derived for the Gittins indices with tunable approximation, and the design of a reinforcement learning algorithm is presented for calculating the Gittins indices, depending on whether the Markovian channel parameters are available a priori or not. Finally, the superiority of the scheme is presented via extensive experiments compared to other existing schemes in terms of the quality of policies and optimality.
  • Keywords
    Multi-armed bandit (MAB) problem; opportunistic spectrum access (OSA); partially observed Markov decision process (POMDP); reinforcement learning (RL);
  • fLanguage
    English
  • Journal_Title
    Communications and Networks, Journal of
  • Publisher
    ieee
  • ISSN
    1229-2370
  • Type

    jour

  • DOI
    10.1109/JCN.2009.6391388
  • Filename
    6391388