مرکز منطقه ای اطلاع رساني علوم و فناوري - Gaussian processes for informative exploration in reinforcement learning

DocumentCode :

3520891

Title :

Gaussian processes for informative exploration in reinforcement learning

Author :

Jen Jen Chung ; Lawrance, Nicholas R. J. ; Sukkarieh, Salah

Author_Institution :

Australian Centre for Field Robot., Univ. of Sydney, Sydney, NSW, Australia

fYear :

2013

fDate :

6-10 May 2013

Firstpage :

2633

Lastpage :

2639

Abstract :

This paper presents the iGP-SARSA(λ) algorithm for temporal difference reinforcement learning (RL) with non-myopic information gain considerations. The proposed algorithm uses a Gaussian process (GP) model to approximate the state-action value function, Q, and incorporates the variance measure from the GP into the calculation of the discounted information gain value for all future state-actions rolled out from the current state-action. The algorithm was compared against a standard SARSA(λ) algorithm on two simulated examples: a battery charge/discharge problem, and a soaring glider problem. Results show that incorporating the information gain value into the action selection encouraged exploration early on, allowing the iGP-SARSA(λ) algorithm to converge to a more profitable reward cycle, while the e-greedy exploration strategy in the SARSA(λ) algorithm failed to search beyond the local optimal solution.

Keywords :

Gaussian processes; learning (artificial intelligence); ε-greedy exploration strategy; Gaussian process; battery discharge problem; iGP-SARSA algorithm; informative exploration; nonmyopic information gain; soaring glider problem; state-action value function; temporal difference reinforcement learning; Approximation algorithms; Batteries; Discharges (electric); Function approximation; Tiles; Training;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Robotics and Automation (ICRA), 2013 IEEE International Conference on

Conference_Location :

Karlsruhe

ISSN :

1050-4729

Print_ISBN :

978-1-4673-5641-1

Type :

conf

DOI :

10.1109/ICRA.2013.6630938

Filename :

6630938

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3520891