• DocumentCode
    339239
  • Title

    A heuristic Q-learning architecture for fully exploring a world and deriving an optimal policy by model-based planning

  • Author

    Zhao, Gang ; Tatsumi, Shoji ; Sun, Ruoying

  • Author_Institution
    Fac. of Eng., Osaka City Univ., Japan
  • Volume
    3
  • fYear
    1999
  • fDate
    1999
  • Firstpage
    2078
  • Abstract
    For solving Markov decision processes with incomplete information on robot learning tasks, model-based algorithm makes effective use of gathered data, but usually requires extreme computation. Dyna-Q is an architecture that uses experiences to build a model and uses the model to adjust the policy simultaneously, however, it does not help an agent to explore an environment actively. In, this paper, we present an Exa-Q architecture which learns models and makes plans using learned models to help the reinforcement learning agent explore an environment actively and improve the reinforcement function estimate. As a result, the Exa-Q architecture can identify an environment fully and speed up the learning rate for deriving the optimal policy. Experimental results demonstrate that the proposed method is efficient
  • Keywords
    Markov processes; heuristic programming; learning (artificial intelligence); optimisation; planning (artificial intelligence); robots; Dyna-Q; Exa-Q architecture; Markov decision processes; heuristic Q-learning architecture; model-based planning; optimal policy; reinforcement learning agent; Business; Computer architecture; Data engineering; Educational institutions; Educational robots; Engineering management; Learning; Orbital robotics; Sun; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Robotics and Automation, 1999. Proceedings. 1999 IEEE International Conference on
  • Conference_Location
    Detroit, MI
  • ISSN
    1050-4729
  • Print_ISBN
    0-7803-5180-0
  • Type

    conf

  • DOI
    10.1109/ROBOT.1999.770413
  • Filename
    770413