DocumentCode
77372
Title
Learning in a Changing World: Restless Multiarmed Bandit With Unknown Dynamics
Author
Haoyang Liu ; Keqin Liu ; Qing Zhao
Author_Institution
Electr. & Comput. Eng., UC Davis, Davis, CA, USA
Volume
59
Issue
3
fYear
2013
fDate
Mar-13
Firstpage
1902
Lastpage
1916
Abstract
We consider the restless multiarmed bandit problem with unknown dynamics in which a player chooses one out of N arms to play at each time. The reward state of each arm transits according to an unknown Markovian rule when it is played and evolves according to an arbitrary unknown random process when it is passive. The performance of an arm selection policy is measured by regret, defined as the reward loss with respect to the case where the player knows which arm is the most rewarding and always plays the best arm. We construct a policy with an interleaving exploration and exploitation epoch structure that achieves a regret with logarithmic order. We further extend the problem to a decentralized setting where multiple distributed players share the arms without information exchange. Under both an exogenous restless model and an endogenous restless model, we show that a decentralized extension of the proposed policy preserves the logarithmic regret order as in the centralized setting. The results apply to adaptive learning in various dynamic systems and communication networks, as well as financial investment.
Keywords
Markov processes; learning (artificial intelligence); adaptive learning; arbitrary unknown random process; arm selection policy; changing world; communication networks; decentralized setting; dynamic system; endogenous restless model; exogenous restless model; exploitation epoch structure; financial investment; information exchange; interleaving exploration; logarithmic order; logarithmic regret order; multiple distributed players; restless multiarmed bandit problem; unknown Markovian rule; unknown dynamics; Bayesian methods; Fading; Indexes; Loss measurement; Markov processes; Random processes; Transient analysis; Distributed learning; online learning; regret; restless multiarmed bandit (RMAB);
fLanguage
English
Journal_Title
Information Theory, IEEE Transactions on
Publisher
ieee
ISSN
0018-9448
Type
jour
DOI
10.1109/TIT.2012.2230215
Filename
6362216
Link To Document