Postponed Updates for Temporal-Difference Reinforcement Learning

Author

Van Seijen, Harm ; Whiteson, Shimon

Author_Institution

TNO Defence, Security & Safety, The Hague, Netherlands

fYear

2009

fDate

Nov. 30 2009-Dec. 2 2009

Firstpage

665

Lastpage

672

Abstract

This paper presents postponed updates, a new strategy for TD methods that can improve sample efficiency without incurring the computational and space requirements of model-based RL. By recording the agent´s last-visit experience, the agent can delay its update until the given state is revisited, thereby improving the quality of the update. Experimental results demonstrate that postponed updates outperforms several competitors, most notably eligibility traces, a traditional way to improve the sample efficiency of TD methods. It achieves this without the need to tune an extra parameter as is needed for eligibility traces.

Keywords

learning (artificial intelligence); model-based reinforcement learning; postponed updates; temporal-difference reinforcement learning; Computational efficiency; Delay; Informatics; Intelligent agent; Intelligent systems; Learning; Optimal control; Safety; Security; State estimation; eligibility traces; reinforcement learning;

fLanguage

English

Publisher

ieee

Conference_Titel

Intelligent Systems Design and Applications, 2009. ISDA '09. Ninth International Conference on

Conference_Location

Pisa

Print_ISBN

978-1-4244-4735-0

Electronic_ISBN

978-0-7695-3872-3

Type

conf

DOI

10.1109/ISDA.2009.76

Filename

5365052

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=2845455