مرکز منطقه ای اطلاع رساني علوم و فناوري - Discount and speed/execution tradeoffs in Markov Decision Process games

DocumentCode :

3476570

Title :

Discount and speed/execution tradeoffs in Markov Decision Process games

Author :

Uribe, Rosa ; Lozano, Fernando ; Shibata, Kenji ; Anderson, C.

fYear :

2011

fDate :

Aug. 31 2011-Sept. 3 2011

Firstpage :

Lastpage :

Abstract :

We study Markov Decision Process (MDP) games with the usual ±1 reinforcement signal. We consider the scenario in which the goal of the game, rather than just winning, is to maximize the number of wins in an allotted period of time (or maximize the expected reward in the same period). In the reinforcement learning literature, this type of tradeoff is often handled by tuning the discount parameter in order to encourage the learning algorithm to find policies that take fewer steps on average, at the cost of a lower probability of winning. We show that this approach is not guaranteed to solve the tradeoff problem optimally, and hence a different strategy is needed when tackling this type of problems.

Keywords :

Markov processes; game theory; learning (artificial intelligence); MDP; Markov decision process games; reinforcement learning; tradeoff problem; Computational intelligence; Conferences; Educational institutions; Equations; Games; Learning; Markov processes;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computational Intelligence and Games (CIG), 2011 IEEE Conference on

Conference_Location :

Seoul

Print_ISBN :

978-1-4577-0010-1

Electronic_ISBN :

978-1-4577-0009-5

Type :

conf

DOI :

10.1109/CIG.2011.6031992

Filename :

6031992

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3476570