Title :
Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems
Author :
Vakili, Sattar ; Keqin Liu ; Qing Zhao
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of California, Davis, Davis, CA, USA
Abstract :
In the Multi-Armed Bandit (MAB) problem, there is a given set of arms with unknown reward models. At each time, a player selects one arm to play, aiming to maximize the total expected reward over a horizon of length T. An approach based on a Deterministic Sequencing of Exploration and Exploitation (DSEE) is developed for constructing sequential arm selection policies. It is shown that for all light-tailed reward distributions, DSEE achieves the optimal logarithmic order of the regret, where regret is defined as the total expected reward loss against the ideal case with known reward models. For heavy-tailed reward distributions, DSEE achieves O(T1/p) regret when the moments of the reward distributions exist up to the pth order for and O(T1/(1+p/2)) for p > 2. With the knowledge of an upper bound on a finite moment of the heavy-tailed reward distributions, DSEE offers the optimal logarithmic regret order. The proposed DSEE approach complements existing work on MAB by providing corresponding results for general reward distributions. Furthermore, with a clearly defined tunable parameter-the cardinality of the exploration sequence, the DSEE approach is easily extendable to variations of MAB, including MAB with various objectives, decentralized MAB with multiple players and incomplete reward observations under collisions, restless MAB with unknown dynamics, and combinatorial MAB with dependent arms that often arise in network optimization problems such as the shortest path, the minimum spanning tree, and the dominating set problems under unknown random weights.
Keywords :
decision theory; statistical distributions; trees (mathematics); DSEE approach; MAB problem; combinatorial MAB approach; decentralized MAB approach; dependent arms; deterministic sequencing of exploration and exploitation approach; dominating set problems; exploration sequence; heavy-tailed reward distributions; light-tailed reward distributions; minimum spanning tree; multiarmed bandit problems; network optimization problems; optimal logarithmic regret order; reward models; sequential arm selection policies; shortest path; unknown random weights; upper bound; Biological system modeling; Complexity theory; Indexes; Markov processes; Random variables; Sequential analysis; Upper bound; Multi-armed bandit; combinatorial multi-armed bandit; decentralized multi-armed bandit; deterministic sequencing of exploration and exploitation; regret; restless multi-armed bandit;
Journal_Title :
Selected Topics in Signal Processing, IEEE Journal of
DOI :
10.1109/JSTSP.2013.2263494