مرکز منطقه ای اطلاع رساني علوم و فناوري - The scalarized multi-objective multi-armed bandit problem: An empirical study of its exploration vs. exploitation tradeoff

DocumentCode :

1797299

Title :

The scalarized multi-objective multi-armed bandit problem: An empirical study of its exploration vs. exploitation tradeoff

Author :

Yahyaa, Saba Q. ; Drugan, Madalina M. ; Manderick, Bernard

Author_Institution :

Artificial Intell. Lab., Vrije Univ. Brussel, Brussels, Belgium

fYear :

2014

fDate :

6-11 July 2014

Firstpage :

2290

Lastpage :

2297

Abstract :

The multi-armed bandit (MAB) problem is the simplest sequential decision process with stochastic rewards where an agent chooses repeatedly from different arms to identify as soon as possible the optimal arm, i.e. the one of the highest mean reward. Both the knowledge gradient (KG) policy and the upper confidence bound (UCB) policy work well in practice for the MAB-problem because of a good balance between exploitation and exploration while choosing arms. In case of the multi-objective MAB (or MOMAB)-problem, arms generate a vector of rewards, one per arm, instead of a single scalar reward. In this paper, we extend the KG-policy to address multi-objective problems using scalarization functions that transform reward vectors into single scalar reward. We consider different scalarization functions and we call the corresponding class of algorithms scalarized KG. We compare the resulting algorithms with the corresponding variants of the multi-objective UCBl-policy (MO-UCB1) on a number of MOMAB-problems where the reward vectors are drawn from a multivariate normal distribution. We compare experimentally the exploration versus exploitation trade-off and we conclude that scalarized-KG outperforms MO-UCB1 on these test problems.

Keywords :

decision theory; normal distribution; stochastic processes; KG-policy; MAB-problem; MO-UCB1; MOMAB-problems; exploration-versus-exploitation trade-off; knowledge gradient; multiobjective MAB; multiobjective UCB1-policy; multivariate normal distribution; reward vectors; scalarization functions; scalarized KG; scalarized multiobjective multiarmed bandit problem; sequential decision process; single scalar reward; stochastic rewards; upper confidence bound; Chebyshev approximation; Gaussian distribution; Indexes; Measurement; Pareto optimization; Standards; Vectors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Neural Networks (IJCNN), 2014 International Joint Conference on

Conference_Location :

Beijing

Print_ISBN :

978-1-4799-6627-1

Type :

conf

DOI :

10.1109/IJCNN.2014.6889390

Filename :

6889390

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1797299