Improved neural fitted Q iteration applied to a novel computer gaming and learning benchmark

Author

Gabel, Thomas ; Lutz, Christian ; Riedmiller, Martin

Author_Institution

Dept. of Comput. Sci., Univ. of Freiburg, Freiburg, Germany

fYear

2011

fDate

11-15 April 2011

Firstpage

279

Lastpage

286

Abstract

Neural batch reinforcement learning (RL) algorithms have recently shown to be a powerful tool for model-free reinforcement learning problems. In this paper, we present a novel learning benchmark from the realm of computer games and apply a variant of a neural batch RL algorithm in the scope of this benchmark. Defining the learning problem and appropriately adjusting all relevant parameters is often a tedious task for the researcher who implements and investigates some learning approach. In RL, the suitable choice of the function c of immediate costs is crucial, and, when utilizing multi-layer perceptron neural networks for the purpose of value function approximation, the definition of c must be well aligned with the specific characteristics of this type of function approximator. Determining this alignment is especially tricky, when no a priori knowledge about the task and, hence, about optimal policies is available. To this end, we propose a simple, but effective dynamic scaling heuristic that can be seamlessly integrated into contemporary neural batch RL algorithms. We evaluate the effectiveness of this heuristic in the context of the well-known pole swing-up benchmark as well as in the context of the novel gaming benchmark we are suggesting.

Keywords

computer games; function approximation; iterative methods; learning (artificial intelligence); multilayer perceptrons; a priori knowledge; computer gaming; contemporary neural batch RL algorithms; dynamic scaling heuristic; function approximator; learning benchmark; model-free reinforcement learning problems; multilayer perceptron neural networks; neural batch reinforcement learning algorithms; neural fitted Q iteration; optimal policy; pole swing-up benchmark; value function approximation; Artificial neural networks; Benchmark testing; Games; Heuristic algorithms; Learning; Learning systems; Marine vehicles;

fLanguage

English

Publisher

ieee

Conference_Titel

Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2011 IEEE Symposium on

Conference_Location

Paris

Print_ISBN

978-1-4244-9887-1

Type

conf

DOI

10.1109/ADPRL.2011.5967361

Filename

5967361