مرکز منطقه ای اطلاع رساني علوم و فناوري - Temporal difference learning with interpolated n-tuples: Initial results from a simulated car racing environment

DocumentCode :

3477284

Title :

Temporal difference learning with interpolated n-tuples: Initial results from a simulated car racing environment

Author :

Abdullahi, A.A. ; Lucas, Simon M.

Author_Institution :

Sch. of Comput. Sci. & Electron. Eng., Univ. of Essex, Colchester, UK

fYear :

2011

fDate :

Aug. 31 2011-Sept. 3 2011

Firstpage :

321

Lastpage :

328

Abstract :

Evolutionary algorithms have been used successfully in car racing game competitions, such as the ones based on TORCS. This is in contrast to temporal difference learning (TDL), which despite being a powerful learning algorithm, has not been used to any significant extent within these competitions. We believe that this is mainly due to the difficulty of choosing a good function approximator, the potential instability of the learning behavior (and hence the reliability of the results), and the lack of a forward model which restricts the choice of TDL algorithms. This paper reports our initial results on using a new type of function approximator designed to be used with TDL for problems with a large number of continuous-valued inputs, where function approximators such as multi-layer perceptrons can be unstable. The approach combines interpolated tables with n-tuple systems. In order to conduct the research in a flexible and efficient way we developed a new car-racing simulator that runs much more quickly than TORCS and gives us full access to the forward model of the system. We investigate different types of tracks and physics models, and also make comparisons with human drivers and some initial tests with evolutionary learning (EL). The results show that each approach leads to different driving styles, and either TDL or EL can learn best depending on the details of the environment. Significantly, TDL produced best results when learning state-action values (similar to Q-learning; no forward model needed). Regarding driving style, TDL consistently learned behaviours that avoid damage while EL tended to evolve fast but reckless drivers.

Keywords :

evolutionary computation; learning (artificial intelligence); traffic engineering computing; TORCS; car racing game competition; car racing simulator; continuous-valued inputs; evolutionary algorithm; evolutionary learning; forward model; function approximator; interpolated n-tuples; learning algorithm; learning behavior; multilayer perceptron; reckless drivers; simulated car racing environment; state-action values; temporal difference learning; Computational intelligence; Computational modeling; Function approximation; Games; Sensors; Software;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computational Intelligence and Games (CIG), 2011 IEEE Conference on

Conference_Location :

Seoul

Print_ISBN :

978-1-4577-0010-1

Electronic_ISBN :

978-1-4577-0009-5

Type :

conf

DOI :

10.1109/CIG.2011.6032023

Filename :

6032023

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3477284