• DocumentCode
    1798016
  • Title

    Multi-objectivization of reinforcement learning problems by reward shaping

  • Author

    Brys, Tim ; Harutyunyan, Anna ; Vrancx, Peter ; Taylor, Matthew E. ; Kudenko, Daniel ; Nowe, Ann

  • Author_Institution
    AI Lab. at the Vrije Univ. Brussel, Brussels, Belgium
  • fYear
    2014
  • fDate
    6-11 July 2014
  • Firstpage
    2315
  • Lastpage
    2322
  • Abstract
    Multi-objectivization is the process of transforming a single objective problem into a multi-objective problem. Research in evolutionary optimization has demonstrated that the addition of objectives that are correlated with the original objective can make the resulting problem easier to solve compared to the original single-objective problem. In this paper we investigate the multi-objectivization of reinforcement learning problems. We propose a novel method for the multi-objectivization of Markov Decision problems through the use of multiple reward shaping functions. Reward shaping is a technique to speed up reinforcement learning by including additional heuristic knowledge in the reward signal. The resulting composite reward signal is expected to be more informative during learning, leading the learner to identify good actions more quickly. Good reward shaping functions are by definition correlated with the target value function for the base reward signal, and we show in this paper that adding several correlated signals can help to solve the basic single objective problem faster and better. We prove that the total ordering of solutions, and by consequence the optimality of solutions, is preserved in this process, and empirically demonstrate the usefulness of this approach on two reinforcement learning tasks: a pathfinding problem and the Mario domain.
  • Keywords
    Markov processes; evolutionary computation; learning (artificial intelligence); optimisation; Mario domain; Markov decision problems; composite reward signal; evolutionary optimization; heuristic knowledge; multiobjective problem; multiobjectivization; multiple reward shaping functions; pathfinding problem; reinforcement learning problems; single objective problem; target value function; Evolutionary computation; Learning (artificial intelligence); Navigation; Pareto optimization; Search problems; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), 2014 International Joint Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4799-6627-1
  • Type

    conf

  • DOI
    10.1109/IJCNN.2014.6889732
  • Filename
    6889732