مرکز منطقه ای اطلاع رساني علوم و فناوري - Using upper confidence bounds for online learning

DocumentCode :

2734757

Title :

Using upper confidence bounds for online learning

Author :

Auer, Peter

Author_Institution :

Inst. for Theor. Comput. Sci., Graz Univ. of Technol., Austria

fYear :

2000

fDate :

2000

Firstpage :

270

Lastpage :

279

Abstract :

We show how a standard tool from statistics, namely confidence bounds, can be used to elegantly deal with situations which exhibit an exploitation/exploration trade-off. Our technique for designing and analyzing algorithms for such situations is very general and can be applied when an algorithm has to make exploitation-versus-exploration decisions based on uncertain information provided by a random process. We consider two models with such an exploitation/exploration trade-off. For the adversarial bandit problem our new algorithm suffers only O˜(T^1/2) regret over T trials which improves significantly over the previously best O˜(T^2/3) regret. We also extend our results for the adversarial bandit problem to shifting bandits. The second model we consider is associative reinforcement learning with linear value functions. For this model our technique improves the regret from O˜(T^3/4) to O˜(T^1/2)

Keywords :

learning (artificial intelligence); random processes; statistical analysis; uncertainty handling; adversarial bandit problem; associative reinforcement learning; exploitation decision; exploration decision; linear value functions; online learning; random process; statistics; uncertain information; upper confidence bounds; Algorithm design and analysis; Computer science; Information analysis; Learning; Random processes; Random variables; Statistical analysis; Uncertainty;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on

Conference_Location :

Redondo Beach, CA

ISSN :

0272-5428

Print_ISBN :

0-7695-0850-2

Type :

conf

DOI :

10.1109/SFCS.2000.892116

Filename :

892116

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2734757