Title :
Two Novel On-policy Reinforcement Learning Algorithms based on TD(λ)-methods
Author :
Wiering, Marco A. ; Van Hasselt, Hado
Author_Institution :
Dept. of Inf. & Comput. Sci., Utrecht Univ.
Abstract :
This paper describes two novel on-policy reinforcement learning algorithms, named QV(λ)-learning and the actor critic learning automaton (ACLA). Both algorithms learn a state value-function using TD(λ)-methods. The difference between the algorithms is that QV-learning uses the learned value function and a form of Q-learning to learn Q-values, whereas ACLA uses the value function and a learning automaton-like update rule to update the actor. We describe several possible advantages of these methods compared to other value-function-based reinforcement learning algorithms such as Q-learning, Sarsa, and conventional actor-critic methods. Experiments are performed on (1) small, (2) large, (3) partially observable, and (4) dynamic maze problems with tabular and neural network value-function representations, and on the mountain car problem. The overall results show that the two novel algorithms can outperform previously known reinforcement learning algorithms
Keywords :
learning (artificial intelligence); learning automata; Q-learning; Q-value; QV-learning; actor critic learning automaton; automaton-like update rule; neural network value function representation; on-policy reinforcement learning algorithm; state value function; value function-based reinforcement learning; Dynamic programming; Intelligent systems; Learning automata; Neural networks; Optimal control; Probability distribution; State estimation; Stochastic systems;
Conference_Titel :
Approximate Dynamic Programming and Reinforcement Learning, 2007. ADPRL 2007. IEEE International Symposium on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0706-0
DOI :
10.1109/ADPRL.2007.368200