• DocumentCode
    2887180
  • Title

    Adaptive learning of uncontrolled restless bandits with logarithmic regret

  • Author

    Tekin, Cem ; Liu, Mingyan

  • Author_Institution
    Dept. of Electr. Eng. & Comput. Sci., Univ. of Michigan, Ann Arbor, MI, USA
  • fYear
    2011
  • fDate
    28-30 Sept. 2011
  • Firstpage
    983
  • Lastpage
    990
  • Abstract
    In this paper we consider the problem of learning the optimal policy for the uncontrolled restless bandit problem. In this problem only the state of the selected arm can be observed, the state transitions are independent of control and the transition law is unknown. We propose a learning algorithm which gives logarithmic regret uniformly over time with respect to the optimal finite horizon policy with known transition law under some assumptions on the transition probabilities of the arms and the structure of the optimal stationary policy for the infinite horizon average reward problem.
  • Keywords
    Markov processes; computational complexity; learning (artificial intelligence); adaptive learning algorithm; infinite horizon average reward problem; logarithmic regret; optimal finite horizon policy; optimal stationary policy; state transition; transition law; transition probability; uncontrolled restless bandit problem; Equations; History; Indexes; Markov processes; Mathematical model; Upper bound; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communication, Control, and Computing (Allerton), 2011 49th Annual Allerton Conference on
  • Conference_Location
    Monticello, IL
  • Print_ISBN
    978-1-4577-1817-5
  • Type

    conf

  • DOI
    10.1109/Allerton.2011.6120273
  • Filename
    6120273