DocumentCode :
3269419
Title :
The second order temporal difference error for Sarsa(λ)
Author :
Qiming Fu ; Quan Liu ; Fei Xiao ; Guixin Chen
Author_Institution :
Dept. of Comput. Sci. & Technol., Soochow Univ., Suzhou, China
fYear :
2013
fDate :
16-19 April 2013
Firstpage :
60
Lastpage :
68
Abstract :
Traditional reinforcement learning algorithms, such as Q-learning, Q(λ), Sarsa, and Sarsa(λ), update the action value function using temporal difference (TD) error, which is computed by the last action value function. From the perspective of the TD error, and with respect to the problems of low efficiency and slow convergence of the traditional Sarsa(λ) algorithm, this paper defines the nth order TD Error, applies it in the traditional Sarsa(λ) algorithm, and develops a fast Sarsa(λ) algorithm based on the 2nd order TD Error. The algorithm adjusts the Q value with the second-order TD Error and broadcasts the TD Error into the whole state-action space, which speeds up the convergence of the algorithm. This paper also analyzes the convergence rate, and under the condition of one-step update, the results show that the number of iteration depends primarily on γ, ε. Finally, using the proposed algorithm on the traditional reinforcement learning problems, the results show that the algorithm has both a faster convergence rate and better convergence performance.
Keywords :
convergence; learning (artificial intelligence); state-space methods; Q value; Q(λ); Q-learning; Sarsa(λ) algorithm; action value function; reinforcement learning algorithms; reinforcement learning problems; second order temporal difference error; second-order TD error; state-action space; Algorithm design and analysis; Convergence; Educational institutions; Equations; Learning (artificial intelligence); Machine learning algorithms; Mathematical model; Eligibility Trace; Markov Decision Process; Reinforcement Learning; Sarsa(λ) Algorithm; Second Order TD Error;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2013 IEEE Symposium on
Conference_Location :
Singapore
ISSN :
2325-1824
Type :
conf
DOI :
10.1109/ADPRL.2013.6614990
Filename :
6614990
Link To Document :
بازگشت