Title : 
Reinforcement learning to train Ms. Pac-Man using higher-order action-relative inputs
         
        
            Author : 
Bom, Luuk ; Henken, Ruud ; Wiering, Marco
         
        
            Author_Institution : 
Inst. of Artificial Intell. & Cognitive Eng., Univ. of Groningen, Groningen, Netherlands
         
        
        
        
        
        
            Abstract : 
Reinforcement learning algorithms enable an agent to optimize its behavior from interacting with a specific environment. Although some very successful applications of reinforcement learning algorithms have been developed, it is still an open research question how to scale up to large dynamic environments. In this paper we will study the use of reinforcement learning on the popular arcade video game Ms. Pac-Man. In order to let Ms. Pac-Man quickly learn, we designed particular smart feature extraction algorithms that produce higher-order inputs from the game-state. These inputs are then given to a neural network that is trained using Q-learning. We constructed higher-order features which are relative to the action of Ms. Pac-Man. These relative inputs are then given to a single neural network which sequentially propagates the action-relative inputs to obtain the different Q-values of different actions. The experimental results show that this approach allows the use of only 7 input units in the neural network, while still quickly obtaining very good playing behavior. Furthermore, the experiments show that our approach enables Ms. Pac-Man to successfully transfer its learned policy to a different maze on which it was not trained before.
         
        
            Keywords : 
computer games; feature extraction; learning (artificial intelligence); neural nets; Q-learning; arcade video game Ms. Pac-Man; dynamic environments; game state; higher order action relative inputs; neural network; open research question; reinforcement learning algorithms; smart feature extraction algorithms; train Ms. Pac-Man; Biological neural networks; Games; Heuristic algorithms; Learning (artificial intelligence); Neurons; Training;
         
        
        
        
            Conference_Titel : 
Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2013 IEEE Symposium on
         
        
            Conference_Location : 
Singapore
         
        
        
        
            DOI : 
10.1109/ADPRL.2013.6615002