مرکز منطقه ای اطلاع رساني علوم و فناوري - A theoretical and empirical analysis of Expected Sarsa

DocumentCode :

493376

Title :

A theoretical and empirical analysis of Expected Sarsa

Author :

Van Seijen, Harm ; Van Hasselt, Hado ; Whiteson, Shimon ; Wiering, Marco

Author_Institution :

Integrated Syst. Group, TNO Defense, Safety & Security, The Hague

fYear :

2009

fDate :

March 30 2009-April 2 2009

Firstpage :

177

Lastpage :

184

Abstract :

This paper presents a theoretical and empirical analysis of Expected Sarsa, a variation on Sarsa, the classic on-policy temporal-difference method for model-free reinforcement learning. Expected Sarsa exploits knowledge about stochasticity in the behavior policy to perform updates with lower variance. Doing so allows for higher learning rates and thus faster learning. In deterministic environments, Expected Sarsas updates have zero variance, enabling a learning rate of 1. We prove that Expected Sarsa converges under the same conditions as Sarsa and formulate specific hypotheses about when Expected Sarsa will outperform Sarsa and Q-learning. Experiments in multiple domains confirm these hypotheses and demonstrate that Expected Sarsa has significant advantages over these more commonly used methods.

Keywords :

learning (artificial intelligence); stochastic processes; behavior policy; deterministic environment; expected Sarsa analysis; model-free reinforcement learning; on-policy temporal-difference method; stochasticity; zero variance; Artificial intelligence; Convergence; Dynamic programming; Intelligent systems; Optimal control; Probability distribution; Robot control; State estimation; State feedback; Supervised learning;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Adaptive Dynamic Programming and Reinforcement Learning, 2009. ADPRL '09. IEEE Symposium on

Conference_Location :

Nashville, TN

Print_ISBN :

978-1-4244-2761-1

Type :

conf

DOI :

10.1109/ADPRL.2009.4927542

Filename :

4927542

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=493376