مرکز منطقه ای اطلاع رساني علوم و فناوري - Designing multi-objective multi-armed bandits algorithms: A study

DocumentCode :

671694

Title :

Designing multi-objective multi-armed bandits algorithms: A study

Author :

Drugan, Madalina M. ; Nowe, Ann

Author_Institution :

Comput. Sci. Dept., Vrije Univ. Brussels, Brussels, Belgium

fYear :

2013

fDate :

4-9 Aug. 2013

Firstpage :

Lastpage :

Abstract :

We propose an algorithmic framework for multi-objective multi-armed bandits with multiple rewards. Different partial order relationships from multi-objective optimization can be considered for a set of reward vectors, such as scalarization functions and Pareto search. A scalarization function transforms the multi-objective environment into a single objective environment and are a popular choice in multi-objective reinforcement learning. Scalarization techniques can be straightforwardly implemented into the current multi-armed bandit framework, but the efficiency of these algorithms depends very much on their type, linear or non-linear (e.g. Chebyshev), and their parameters. Using Pareto dominance order relationship allows to explore the multi-objective environment directly, however this can result in large sets of Pareto optimal solutions. In this paper we propose and evaluate the performance of multi-objective MABs using three regret metric criteria. The standard UCB1 is extended to scalarized multi-objective UCB1 and we propose a Pareto UCB1 algorithm. Both algorithms are proven to have a logarithmic upper bound for their expected regret. We also introduce a variant of the scalarized multi-objective UCB1 that removes online inefficient scalarizations in order to improve the algorithm´s efficiency. These algorithms are experimentally compared on multi-objective Bernoulli distributions, Pareto UCB1 being the algorithm with the best empirical performance.

Keywords :

Pareto optimisation; learning (artificial intelligence); search problems; Pareto UCB1 algorithm; Pareto dominance order relationship; Pareto optimal solutions; Pareto search; multiobjective Bernoulli distributions; multiobjective MAB; multiobjective environment; multiobjective multiarmed bandits algorithms; multiobjective optimization; multiobjective reinforcement learning; multiple rewards; partial order relationships; reward vectors; scalarization functions; scalarized multiobjective UCB1; single objective environment; standard UCB1; Algorithm design and analysis; Chebyshev approximation; Heuristic algorithms; Measurement; Pareto optimization; Standards; Vectors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Neural Networks (IJCNN), The 2013 International Joint Conference on

Conference_Location :

Dallas, TX

ISSN :

2161-4393

Print_ISBN :

978-1-4673-6128-6

Type :

conf

DOI :

10.1109/IJCNN.2013.6707036

Filename :

6707036

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=671694