مرکز منطقه ای اطلاع رساني علوم و فناوري - Machine learning and nonparametric bandit theory

DocumentCode :

811012

Title :

Machine learning and nonparametric bandit theory

Author :

Lai, Tze-bung ; Yakowitz, Sidney

Author_Institution :

Dept. of Stat., Stanford Univ., CA, USA

Volume :

Issue :

fYear :

1995

fDate :

7/1/1995 12:00:00 AM

Firstpage :

1199

Lastpage :

1209

Abstract :

In its most basic form, bandit theory is concerned with the design problem of sequentially choosing members from a given collection of random variables so that the regret, i.e., R_n=Σ_j(μ*-μ_j)ET_n(j), grows as slowly as possible with increasing n. Here μ_j is the expected value of the bandit arm (i.e., random variable) indexed by j, T_n(j) is the number of times arm j has been selected in the first n decision stages, and μ^*=sup_j μ_j. The present paper contributes to the theory by considering the situation in which observations are dependent. To begin with, the dependency is presumed to depend only on past observations of the same arm, but later, we allow that it may be with respect to the entire past and that the set of arms is infinite. This brings queues and, more generally, controlled Markov processes into our purview. Thus our “black-box” methodology is suitable for the case when the only observables are cost values and, in particular, the probability structure and loss function are unknown to the designer. The conclusion of the analysis is that under lenient conditions, using algorithms prescribed herein, risk growth is commensurate with that in the simplest i.i.d. cases. Our methods represent an alternative to stochastic-approximation/perturbation-analysis ideas for tuning queues

Keywords :

learning (artificial intelligence); queueing theory; random processes; statistical analysis; controlled Markov processes; cost values; expected value; machine learning; nonparametric bandit theory; queues; random variable; risk growth; Arm; Bayesian methods; Industrial engineering; Machine learning; Medical tests; Medical treatment; National security; Read only memory; Sequential analysis; Statistics;

fLanguage :

English

Journal_Title :

Automatic Control, IEEE Transactions on

Publisher :

ieee

ISSN :

0018-9286

Type :

jour

DOI :

10.1109/9.400491

Filename :

400491

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=811012