Title : 
A heuristic Q-learning architecture for fully exploring a world and deriving an optimal policy by model-based planning
         
        
            Author : 
Zhao, Gang ; Tatsumi, Shoji ; Sun, Ruoying
         
        
            Author_Institution : 
Fac. of Eng., Osaka City Univ., Japan
         
        
        
        
        
        
            Abstract : 
For solving Markov decision processes with incomplete information on robot learning tasks, model-based algorithm makes effective use of gathered data, but usually requires extreme computation. Dyna-Q is an architecture that uses experiences to build a model and uses the model to adjust the policy simultaneously, however, it does not help an agent to explore an environment actively. In, this paper, we present an Exa-Q architecture which learns models and makes plans using learned models to help the reinforcement learning agent explore an environment actively and improve the reinforcement function estimate. As a result, the Exa-Q architecture can identify an environment fully and speed up the learning rate for deriving the optimal policy. Experimental results demonstrate that the proposed method is efficient
         
        
            Keywords : 
Markov processes; heuristic programming; learning (artificial intelligence); optimisation; planning (artificial intelligence); robots; Dyna-Q; Exa-Q architecture; Markov decision processes; heuristic Q-learning architecture; model-based planning; optimal policy; reinforcement learning agent; Business; Computer architecture; Data engineering; Educational institutions; Educational robots; Engineering management; Learning; Orbital robotics; Sun; Training data;
         
        
        
        
            Conference_Titel : 
Robotics and Automation, 1999. Proceedings. 1999 IEEE International Conference on
         
        
            Conference_Location : 
Detroit, MI
         
        
        
            Print_ISBN : 
0-7803-5180-0
         
        
        
            DOI : 
10.1109/ROBOT.1999.770413