• DocumentCode
    671694
  • Title

    Designing multi-objective multi-armed bandits algorithms: A study

  • Author

    Drugan, Madalina M. ; Nowe, Ann

  • Author_Institution
    Comput. Sci. Dept., Vrije Univ. Brussels, Brussels, Belgium
  • fYear
    2013
  • fDate
    4-9 Aug. 2013
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    We propose an algorithmic framework for multi-objective multi-armed bandits with multiple rewards. Different partial order relationships from multi-objective optimization can be considered for a set of reward vectors, such as scalarization functions and Pareto search. A scalarization function transforms the multi-objective environment into a single objective environment and are a popular choice in multi-objective reinforcement learning. Scalarization techniques can be straightforwardly implemented into the current multi-armed bandit framework, but the efficiency of these algorithms depends very much on their type, linear or non-linear (e.g. Chebyshev), and their parameters. Using Pareto dominance order relationship allows to explore the multi-objective environment directly, however this can result in large sets of Pareto optimal solutions. In this paper we propose and evaluate the performance of multi-objective MABs using three regret metric criteria. The standard UCB1 is extended to scalarized multi-objective UCB1 and we propose a Pareto UCB1 algorithm. Both algorithms are proven to have a logarithmic upper bound for their expected regret. We also introduce a variant of the scalarized multi-objective UCB1 that removes online inefficient scalarizations in order to improve the algorithm´s efficiency. These algorithms are experimentally compared on multi-objective Bernoulli distributions, Pareto UCB1 being the algorithm with the best empirical performance.
  • Keywords
    Pareto optimisation; learning (artificial intelligence); search problems; Pareto UCB1 algorithm; Pareto dominance order relationship; Pareto optimal solutions; Pareto search; multiobjective Bernoulli distributions; multiobjective MAB; multiobjective environment; multiobjective multiarmed bandits algorithms; multiobjective optimization; multiobjective reinforcement learning; multiple rewards; partial order relationships; reward vectors; scalarization functions; scalarized multiobjective UCB1; single objective environment; standard UCB1; Algorithm design and analysis; Chebyshev approximation; Heuristic algorithms; Measurement; Pareto optimization; Standards; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), The 2013 International Joint Conference on
  • Conference_Location
    Dallas, TX
  • ISSN
    2161-4393
  • Print_ISBN
    978-1-4673-6128-6
  • Type

    conf

  • DOI
    10.1109/IJCNN.2013.6707036
  • Filename
    6707036