Title :
A heuristic Q-learning architecture for fully exploring a world and deriving an optimal policy by model-based planning
Author :
Zhao, Gang ; Tatsumi, Shoji ; Sun, Ruoying
Author_Institution :
Fac. of Eng., Osaka City Univ., Japan
Abstract :
For solving Markov decision processes with incomplete information on robot learning tasks, model-based algorithm makes effective use of gathered data, but usually requires extreme computation. Dyna-Q is an architecture that uses experiences to build a model and uses the model to adjust the policy simultaneously, however, it does not help an agent to explore an environment actively. In, this paper, we present an Exa-Q architecture which learns models and makes plans using learned models to help the reinforcement learning agent explore an environment actively and improve the reinforcement function estimate. As a result, the Exa-Q architecture can identify an environment fully and speed up the learning rate for deriving the optimal policy. Experimental results demonstrate that the proposed method is efficient
Keywords :
Markov processes; heuristic programming; learning (artificial intelligence); optimisation; planning (artificial intelligence); robots; Dyna-Q; Exa-Q architecture; Markov decision processes; heuristic Q-learning architecture; model-based planning; optimal policy; reinforcement learning agent; Business; Computer architecture; Data engineering; Educational institutions; Educational robots; Engineering management; Learning; Orbital robotics; Sun; Training data;
Conference_Titel :
Robotics and Automation, 1999. Proceedings. 1999 IEEE International Conference on
Conference_Location :
Detroit, MI
Print_ISBN :
0-7803-5180-0
DOI :
10.1109/ROBOT.1999.770413