Title :
Model-free off-policy reinforcement learning in continuous environment
Author :
Wawrzynski, Pawel ; Pacut, Andrzej
Author_Institution :
Inst. of Control & Comput. Eng., Warsaw Univ. of Technol., Poland
Abstract :
We introduce an algorithm of reinforcement learning in continuous state and action spaces. In order to construct a control policy, the algorithm utilizes the entire history of agent-environment interaction. The policy is a result of an estimation process based on all available information rather than the result of stochastic convergence as in classical reinforcement learning approaches. The policy is derived from the history directly, not through any kind of a model of the environment. We test our algorithm in the cart-pole swing-up simulated environment. The algorithm learns to control this plant in about 100 trials, which corresponds to 15 minutes of plant´s real time. This is several times shorter than the one required by other algorithms.
Keywords :
convergence of numerical methods; estimation theory; learning (artificial intelligence); stochastic processes; agent environment interaction; cart pole swing up simulated environment; control policy; estimation process; model free off policy reinforcement learning; stochastic convergence; Artificial intelligence; Control engineering computing; Convergence; Dynamic programming; History; Learning; Monte Carlo methods; Space technology; Stochastic processes; Testing;
Conference_Titel :
Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on
Print_ISBN :
0-7803-8359-1
DOI :
10.1109/IJCNN.2004.1380086