مرکز منطقه ای اطلاع رساني علوم و فناوري - Maximizing the average reward in episodic reinforcement learning tasks

DocumentCode :

3763563

Title :

Maximizing the average reward in episodic reinforcement learning tasks

Author :

Chris Reinke;Eiji Uchibe;Kenji Doya

Author_Institution :

Okinawa Institute of Science and Technology, Neural Computation Unit, Onna-son, Japan

fYear :

2015

Firstpage :

420

Lastpage :

421

Abstract :

We propose an ensemble method consisting of several Q-learning modules to optimize the average reward in episodic Markov decision processes (MDPs). It can be proven that the method learns and optimizes the average reward in MDPs where non-zero rewards are only given by transitions into goal states and the decision for a trajectory to a goal state is only possible in the start state. We introduced a sampling method for MDPs to show that the average reward can also be optimized to a high degree in MDPs which do not fulfill these conditions.

Keywords :

"Trajectory","Learning (artificial intelligence)","Robots","Human computer interaction","Markov processes","Sampling methods"

Publisher :

ieee

Conference_Titel :

Intelligent Informatics and Biomedical Sciences (ICIIBMS), 2015 International Conference on

Type :

conf

DOI :

10.1109/ICIIBMS.2015.7439495

Filename :

7439495

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3763563