DocumentCode :
8469
Title :
Experience replay for least-squares policy iteration
Author :
Quan Liu ; Xin Zhou ; Fei Zhu ; Qiming Fu ; Yuchen Fu
Author_Institution :
Sch. of Comput. Sci. & Technol., Soochow Univ., Suzhou, China
Volume :
1
Issue :
3
fYear :
2014
fDate :
Jul-14
Firstpage :
274
Lastpage :
281
Abstract :
Policy iteration, which evaluates and improves the control policy iteratively, is a reinforcement learning method. Policy evaluation with the least-squares method can draw more useful information from the empirical data and therefore improve the data validity. However, most existing online least-squares policy iteration methods only use each sample just once, resulting in the low utilization rate. With the goal of improving the utilization efficiency, we propose an experience replay for least-squares policy iteration (ERLSPI) and prove its convergence. ERLSPI method combines online least-squares policy iteration method with experience replay, stores the samples which are generated online, and reuses these samples with least-squares method to update the control policy. We apply the ERLSPI method for the inverted pendulum system, a typical benchmark testing. The experimental results show that the method can effectively take advantage of the previous experience and knowledge, improve the empirical utilization efficiency, and accelerate the convergence speed.
Keywords :
iterative methods; learning (artificial intelligence); least squares approximations; ERLSPI method; control policy; experience replay; inverted pendulum system; online least-squares policy iteration method; reinforcement learning method; Computational efficiency; Convergence; Learning (artificial intelligence); Least squares methods; Markov processes; experience replay; least-squares; policy iteration; reinforcement learning;
fLanguage :
English
Journal_Title :
Automatica Sinica, IEEE/CAA Journal of
Publisher :
ieee
ISSN :
2329-9266
Type :
jour
DOI :
10.1109/JAS.2014.7004685
Filename :
7004685
Link To Document :
بازگشت