Title :
Designing multi-objective multi-armed bandits algorithms: A study
Author :
Drugan, Madalina M. ; Nowe, Ann
Author_Institution :
Comput. Sci. Dept., Vrije Univ. Brussels, Brussels, Belgium
Abstract :
We propose an algorithmic framework for multi-objective multi-armed bandits with multiple rewards. Different partial order relationships from multi-objective optimization can be considered for a set of reward vectors, such as scalarization functions and Pareto search. A scalarization function transforms the multi-objective environment into a single objective environment and are a popular choice in multi-objective reinforcement learning. Scalarization techniques can be straightforwardly implemented into the current multi-armed bandit framework, but the efficiency of these algorithms depends very much on their type, linear or non-linear (e.g. Chebyshev), and their parameters. Using Pareto dominance order relationship allows to explore the multi-objective environment directly, however this can result in large sets of Pareto optimal solutions. In this paper we propose and evaluate the performance of multi-objective MABs using three regret metric criteria. The standard UCB1 is extended to scalarized multi-objective UCB1 and we propose a Pareto UCB1 algorithm. Both algorithms are proven to have a logarithmic upper bound for their expected regret. We also introduce a variant of the scalarized multi-objective UCB1 that removes online inefficient scalarizations in order to improve the algorithm´s efficiency. These algorithms are experimentally compared on multi-objective Bernoulli distributions, Pareto UCB1 being the algorithm with the best empirical performance.
Keywords :
Pareto optimisation; learning (artificial intelligence); search problems; Pareto UCB1 algorithm; Pareto dominance order relationship; Pareto optimal solutions; Pareto search; multiobjective Bernoulli distributions; multiobjective MAB; multiobjective environment; multiobjective multiarmed bandits algorithms; multiobjective optimization; multiobjective reinforcement learning; multiple rewards; partial order relationships; reward vectors; scalarization functions; scalarized multiobjective UCB1; single objective environment; standard UCB1; Algorithm design and analysis; Chebyshev approximation; Heuristic algorithms; Measurement; Pareto optimization; Standards; Vectors;
Conference_Titel :
Neural Networks (IJCNN), The 2013 International Joint Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4673-6128-6
DOI :
10.1109/IJCNN.2013.6707036