DocumentCode :
2498178
Title :
Improved neural fitted Q iteration applied to a novel computer gaming and learning benchmark
Author :
Gabel, Thomas ; Lutz, Christian ; Riedmiller, Martin
Author_Institution :
Dept. of Comput. Sci., Univ. of Freiburg, Freiburg, Germany
fYear :
2011
fDate :
11-15 April 2011
Firstpage :
279
Lastpage :
286
Abstract :
Neural batch reinforcement learning (RL) algorithms have recently shown to be a powerful tool for model-free reinforcement learning problems. In this paper, we present a novel learning benchmark from the realm of computer games and apply a variant of a neural batch RL algorithm in the scope of this benchmark. Defining the learning problem and appropriately adjusting all relevant parameters is often a tedious task for the researcher who implements and investigates some learning approach. In RL, the suitable choice of the function c of immediate costs is crucial, and, when utilizing multi-layer perceptron neural networks for the purpose of value function approximation, the definition of c must be well aligned with the specific characteristics of this type of function approximator. Determining this alignment is especially tricky, when no a priori knowledge about the task and, hence, about optimal policies is available. To this end, we propose a simple, but effective dynamic scaling heuristic that can be seamlessly integrated into contemporary neural batch RL algorithms. We evaluate the effectiveness of this heuristic in the context of the well-known pole swing-up benchmark as well as in the context of the novel gaming benchmark we are suggesting.
Keywords :
computer games; function approximation; iterative methods; learning (artificial intelligence); multilayer perceptrons; a priori knowledge; computer gaming; contemporary neural batch RL algorithms; dynamic scaling heuristic; function approximator; learning benchmark; model-free reinforcement learning problems; multilayer perceptron neural networks; neural batch reinforcement learning algorithms; neural fitted Q iteration; optimal policy; pole swing-up benchmark; value function approximation; Artificial neural networks; Benchmark testing; Games; Heuristic algorithms; Learning; Learning systems; Marine vehicles;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2011 IEEE Symposium on
Conference_Location :
Paris
Print_ISBN :
978-1-4244-9887-1
Type :
conf
DOI :
10.1109/ADPRL.2011.5967361
Filename :
5967361
Link To Document :
بازگشت