مرکز منطقه ای اطلاع رساني علوم و فناوري - Finite-time lower bounds for the two-armed bandit problem

DocumentCode :

1347510

Title :

Finite-time lower bounds for the two-armed bandit problem

Author :

Kulkarni, Sanjeev R. ; Lugosi, Gábor

Author_Institution :

Dept. of Electr. Eng., Princeton Univ., NJ, USA

Volume :

Issue :

fYear :

2000

fDate :

4/1/2000 12:00:00 AM

Firstpage :

711

Lastpage :

714

Abstract :

We obtain minimax lower bounds on the regret for the classical two-armed bandit problem. We provide a finite-sample minimax version of the well-known log n asymptotic lower bound of Lai and Robbins (1985). The finite-time lower bound allows us to derive conditions for the amount of time necessary to make any significant gain over a random guessing strategy. These bounds depend on the class of possible distributions of the rewards associated with the arms. For example, in contrast to the log n asymptotic results on the regret, we show that the minimax regret is achieved by mere random guessing under fairly mild conditions on the set of allowable configurations of the two arms. That is, we show that for every allocation rule and for every n, there is a configuration such that the regret at time n is at least 1-ε times the regret of random guessing, where ε is any small positive constant

Keywords :

minimax techniques; random processes; allowable configurations; asymptotic lower bound; finite-sample minimax version; finite-time lower bound; finite-time lower bounds; minimax lower bounds; two-armed bandit problem; Arm; Minimax techniques;

fLanguage :

English

Journal_Title :

Automatic Control, IEEE Transactions on

Publisher :

ieee

ISSN :

0018-9286

Type :

jour

DOI :

10.1109/9.847107

Filename :

847107

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1347510