DocumentCode
592187
Title
Optimality of myopic policy for a class of monotone affine restless multi-armed bandits
Author
Mansourifard, Parisa ; Javidi, Tara ; Krishnamachari, Bhuma
Author_Institution
Ming Hsieh Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
fYear
2012
fDate
10-13 Dec. 2012
Firstpage
877
Lastpage
882
Abstract
We formulate a general class of restless multi-armed bandits with n independent and stochastically identical arms. Each arm is in a real-valued state s ∈ [s0, smax]. Selecting an arm with state s yields an immediate reward with expectation R(s). The state of the arm that is selected stochastically jumps from its current value s to either smax or s0 with probability p(s) or 1 - p(s) respectively. The state of the arms that are not selected evolve according to a function τ (s). We assume that τ (s), p(s), and R(s) are all monotonically increasing affine functions, and τ (s) is a contraction mapping. We then derive a condition on τ (s) under which the simple myopic policy, which selects at each time the arm with the highest immediate reward, is optimal. This extends and generalizes recent results in the literature pertaining to arms evolving as two-state Markov chains.
Keywords
Markov processes; decision making; optimisation; probability; arm selection; contraction mapping; highest immediate reward; immediate reward with expectation; independent identical arms; monotone affine restless multiarmed bandit; monotonically increasing affine functions; myopic policy optimality; probability; real-valued state; stochastic decision problem; stochastically identical arms; two-state Markov chain; Bayesian methods; Educational institutions; Indexes; Linearity; Markov processes; Switches; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Decision and Control (CDC), 2012 IEEE 51st Annual Conference on
Conference_Location
Maui, HI
ISSN
0743-1546
Print_ISBN
978-1-4673-2065-8
Electronic_ISBN
0743-1546
Type
conf
DOI
10.1109/CDC.2012.6425858
Filename
6425858
Link To Document