• DocumentCode
    349969
  • Title

    Proposal for an algorithm to improve a rational policy in POMDPs

  • Author

    Miyazaki, Kazuteru ; Kobayashi, Shigenobu

  • Author_Institution
    Int. Grad. Sch. of Sci. & Eng., Tokyo Inst. of Technol., Yokohama, Japan
  • Volume
    5
  • fYear
    1999
  • fDate
    1999
  • Firstpage
    492
  • Abstract
    Reinforcement learning is a kind of machine learning. Partially observable Markov decision process (POMDP) is a representative class of non-Markovian environments in reinforcement learning. The rational policy making (RPM) algorithm learns a deterministic rational policy in POMDPs. Though RPM can learn a policy very quickly, it needs numerous trials to improve the policy. Furthermore, RPM does not apply the class where there is no deterministic rational policy. In this paper, we propose the rational policy improvement (RPI) algorithm that combines RPM and the mark transit algorithm with χ2-goodness-of-fit test. RPI can learn a deterministic or stochastic rational policy in POMDPs. RPI is applied to maze environments. We show that RPI can learn the most stable rational policy in comparison with other methods
  • Keywords
    Markov processes; decision theory; learning (artificial intelligence); learning systems; observability; machine learning; partially observable Markov decision process; rational policy improvement algorithm; rational policy making algorithm; reinforcement learning; Ear; Economic indicators; Hardware; History; Machine learning algorithms; Proposals; Stochastic processes; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man, and Cybernetics, 1999. IEEE SMC '99 Conference Proceedings. 1999 IEEE International Conference on
  • Conference_Location
    Tokyo
  • ISSN
    1062-922X
  • Print_ISBN
    0-7803-5731-0
  • Type

    conf

  • DOI
    10.1109/ICSMC.1999.815600
  • Filename
    815600