مرکز منطقه ای اطلاع رساني علوم و فناوري - Beating the adaptive bandit with high probability

DocumentCode :

1859416

Title :

Beating the adaptive bandit with high probability

Author :

Abernethy, Jacob ; Rakhlin, Alexander

Author_Institution :

Comput. Sci. Div., UC Berkeley, Berkeley, CA

fYear :

2009

fDate :

8-13 Feb. 2009

Firstpage :

280

Lastpage :

289

Abstract :

We provide a principled way of proving Omacr(radicT) high-probability guarantees for partial-information (bandit) problems over arbitrary convex decision sets. First, we prove a regret guarantee for the full-information problem in terms of ldquolocalrdquo norms, both for entropy and self-concordant barrier regularization, unifying these methods. Given one of such algorithms as a black-box, we can convert a bandit problem into a full-information problem using a sampling scheme. The main result states that a high-probability Omacr(radicT) bound holds whenever the black-box, the sampling scheme, and the estimates of missing information satisfy a number of conditions, which are relatively easy to check. At the heart of the method is a construction of linear upper bounds on confidence intervals. As applications of the main result, we provide the first known efficient algorithm for the sphere with an Omacr(radicT) high-probability bound. We also derive the result for the n-simplex, improving the O(radicnT log(nT)) bound of Auer et al [3] by replacing the log T term with log log T and closing the gap to the lower bound of Omacr(radicnT). While Omacr(radicT) high-probability bounds should hold for general decision sets through our main result, construction of linear upper bounds depends on the particular geometry of the set; we believe that the sphere example already exhibits the necessary ingredients. The guarantees we obtain hold for adaptive adversaries (unlike the in-expectation results of [1]) and the algorithms are efficient, given that the linear upper bounds on confidence can be computed.

Keywords :

computational complexity; optimisation; probability; set theory; adaptive bandit; arbitrary convex decision sets; general decision sets; high-probability bound; partial-information problems; sampling scheme; Computer science; Cost function; Entropy; Heart; Jacobian matrices; Probability; Sampling methods; State estimation; Statistics; Upper bound;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Information Theory and Applications Workshop, 2009

Conference_Location :

San Diego, CA

Print_ISBN :

978-1-4244-3990-4

Type :

conf

DOI :

10.1109/ITA.2009.5044958

Filename :

5044958

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1859416