DocumentCode :
3520891
Title :
Gaussian processes for informative exploration in reinforcement learning
Author :
Jen Jen Chung ; Lawrance, Nicholas R. J. ; Sukkarieh, Salah
Author_Institution :
Australian Centre for Field Robot., Univ. of Sydney, Sydney, NSW, Australia
fYear :
2013
fDate :
6-10 May 2013
Firstpage :
2633
Lastpage :
2639
Abstract :
This paper presents the iGP-SARSA(λ) algorithm for temporal difference reinforcement learning (RL) with non-myopic information gain considerations. The proposed algorithm uses a Gaussian process (GP) model to approximate the state-action value function, Q, and incorporates the variance measure from the GP into the calculation of the discounted information gain value for all future state-actions rolled out from the current state-action. The algorithm was compared against a standard SARSA(λ) algorithm on two simulated examples: a battery charge/discharge problem, and a soaring glider problem. Results show that incorporating the information gain value into the action selection encouraged exploration early on, allowing the iGP-SARSA(λ) algorithm to converge to a more profitable reward cycle, while the e-greedy exploration strategy in the SARSA(λ) algorithm failed to search beyond the local optimal solution.
Keywords :
Gaussian processes; learning (artificial intelligence); ε-greedy exploration strategy; Gaussian process; battery discharge problem; iGP-SARSA algorithm; informative exploration; nonmyopic information gain; soaring glider problem; state-action value function; temporal difference reinforcement learning; Approximation algorithms; Batteries; Discharges (electric); Function approximation; Tiles; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Robotics and Automation (ICRA), 2013 IEEE International Conference on
Conference_Location :
Karlsruhe
ISSN :
1050-4729
Print_ISBN :
978-1-4673-5641-1
Type :
conf
DOI :
10.1109/ICRA.2013.6630938
Filename :
6630938
Link To Document :
بازگشت