• DocumentCode
    2043949
  • Title

    Exploitation/exploration learning for MDP environment

  • Author

    Iwata, Kazunori ; Ito, Nobuhiro ; Yarnauchi, K. ; Ishii, Naohiro

  • Author_Institution
    Dept. of Intelligence & Comput. Sci., Nagoya Inst. of Technol., Japan
  • Volume
    1
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    149
  • Abstract
    Reinforcement learning is an effective learning in unknown environment, where a supervisor cannot support the learner. The agent needs a large number of trial-and-error interactions to find optimal behaviors. This leads to a serious problem if the agent is in a dynamic environment, because the agent cannot adapt to the new changed environment quickly. To overcome the drawback, we propose a new reinforcement learning method for quick adaptation. In the new method, the agent maintains both an exploitation (EI) strategy and an exploration (ER) strategy alternately for each state. In the El strategy, the agent tries to select the best action using the past memory of the agent. While in the ER strategy, the agent tries to identify the environment using an estimation of error and search new states from an unknown region in the state space. Using these two strategies, the agent can reduce the redundant searching in the state space. Experimental results show the agent yields a quick adaptation to unknown environments
  • Keywords
    Markov processes; decision theory; learning (artificial intelligence); optimisation; software agents; Markov decision process environments; exploitation learning; exploration learning; optimisation; reinforcement learning; software agents; upper bound; Acceleration; Computer science; Delay effects; Erbium; Estimation error; Learning; State estimation; State-space methods; Switches;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Industrial Electronics Society, 2000. IECON 2000. 26th Annual Confjerence of the IEEE
  • Conference_Location
    Nagoya
  • Print_ISBN
    0-7803-6456-2
  • Type

    conf

  • DOI
    10.1109/IECON.2000.973141
  • Filename
    973141