• DocumentCode
    493370
  • Title

    The QV family compared to other reinforcement learning algorithms

  • Author

    Wiering, Marco A. ; Van Hasselt, Hado

  • Author_Institution
    Dept. of Artificial Intell., Univ. of Groningen, Groningen
  • fYear
    2009
  • fDate
    March 30 2009-April 2 2009
  • Firstpage
    101
  • Lastpage
    108
  • Abstract
    This paper describes several new online model-free reinforcement learning (RL) algorithms. We designed three new reinforcement algorithms, namely: QV2, QVMAX, and QVMAX2, that are all based on the QV-learning algorithm, but in contrary to QV-learning, QVMAX and QVMAX2 are off-policy RL algorithms and QV2 is a new on-policy RL algorithm. We experimentally compare these algorithms to a large number of different RL algorithms, namely: Q-learning, Sarsa, R-learning, Actor-Critic, QV-learning, and ACLA. We show experiments on five maze problems of varying complexity. Furthermore, we show experimental results on the cart pole balancing problem. The results show that for different problems, there can be large performance differences between the different algorithms, and that there is not a single RL algorithm that always performs best, although on average QV-learning scores highest.
  • Keywords
    learning (artificial intelligence); QV- MAX2; QV-learning; QV2; QVMAX; R-learning; actor-critic; cart pole balancing problem; reinforcement learning algorithms; Algorithm design and analysis; Differential equations; Learning; Neural networks; Optimal control; Probability distribution; Stochastic systems;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Adaptive Dynamic Programming and Reinforcement Learning, 2009. ADPRL '09. IEEE Symposium on
  • Conference_Location
    Nashville, TN
  • Print_ISBN
    978-1-4244-2761-1
  • Type

    conf

  • DOI
    10.1109/ADPRL.2009.4927532
  • Filename
    4927532