مرکز منطقه ای اطلاع رساني علوم و فناوري - The second order temporal difference error for Sarsa(λ)

DocumentCode :

3269419

Title :

The second order temporal difference error for Sarsa(λ)

Author :

Qiming Fu ; Quan Liu ; Fei Xiao ; Guixin Chen

Author_Institution :

Dept. of Comput. Sci. & Technol., Soochow Univ., Suzhou, China

fYear :

2013

fDate :

16-19 April 2013

Firstpage :

Lastpage :

Abstract :

Traditional reinforcement learning algorithms, such as Q-learning, Q(λ), Sarsa, and Sarsa(λ), update the action value function using temporal difference (TD) error, which is computed by the last action value function. From the perspective of the TD error, and with respect to the problems of low efficiency and slow convergence of the traditional Sarsa(λ) algorithm, this paper defines the n^th order TD Error, applies it in the traditional Sarsa(λ) algorithm, and develops a fast Sarsa(λ) algorithm based on the 2^nd order TD Error. The algorithm adjusts the Q value with the second-order TD Error and broadcasts the TD Error into the whole state-action space, which speeds up the convergence of the algorithm. This paper also analyzes the convergence rate, and under the condition of one-step update, the results show that the number of iteration depends primarily on γ, ε. Finally, using the proposed algorithm on the traditional reinforcement learning problems, the results show that the algorithm has both a faster convergence rate and better convergence performance.

Keywords :

convergence; learning (artificial intelligence); state-space methods; Q value; Q(λ); Q-learning; Sarsa(λ) algorithm; action value function; reinforcement learning algorithms; reinforcement learning problems; second order temporal difference error; second-order TD error; state-action space; Algorithm design and analysis; Convergence; Educational institutions; Equations; Learning (artificial intelligence); Machine learning algorithms; Mathematical model; Eligibility Trace; Markov Decision Process; Reinforcement Learning; Sarsa(λ) Algorithm; Second Order TD Error;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2013 IEEE Symposium on

Conference_Location :

Singapore

ISSN :

2325-1824

Type :

conf

DOI :

10.1109/ADPRL.2013.6614990

Filename :

6614990

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3269419