مرکز منطقه ای اطلاع رساني علوم و فناوري - Model-free off-policy reinforcement learning in continuous environment

DocumentCode :

423668

Title :

Model-free off-policy reinforcement learning in continuous environment

Author :

Wawrzynski, Pawel ; Pacut, Andrzej

Author_Institution :

Inst. of Control & Comput. Eng., Warsaw Univ. of Technol., Poland

Volume :

fYear :

2004

fDate :

25-29 July 2004

Firstpage :

1091

Abstract :

We introduce an algorithm of reinforcement learning in continuous state and action spaces. In order to construct a control policy, the algorithm utilizes the entire history of agent-environment interaction. The policy is a result of an estimation process based on all available information rather than the result of stochastic convergence as in classical reinforcement learning approaches. The policy is derived from the history directly, not through any kind of a model of the environment. We test our algorithm in the cart-pole swing-up simulated environment. The algorithm learns to control this plant in about 100 trials, which corresponds to 15 minutes of plant´s real time. This is several times shorter than the one required by other algorithms.

Keywords :

convergence of numerical methods; estimation theory; learning (artificial intelligence); stochastic processes; agent environment interaction; cart pole swing up simulated environment; control policy; estimation process; model free off policy reinforcement learning; stochastic convergence; Artificial intelligence; Control engineering computing; Convergence; Dynamic programming; History; Learning; Monte Carlo methods; Space technology; Stochastic processes; Testing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on

ISSN :

1098-7576

Print_ISBN :

0-7803-8359-1

Type :

conf

DOI :

10.1109/IJCNN.2004.1380086

Filename :

1380086

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=423668