Proposal for an algorithm to improve a rational policy in POMDPs

Author

Miyazaki, Kazuteru ; Kobayashi, Shigenobu

Author_Institution

Int. Grad. Sch. of Sci. & Eng., Tokyo Inst. of Technol., Yokohama, Japan

Volume

5

fYear

1999

fDate

1999

Firstpage

492

Abstract

Reinforcement learning is a kind of machine learning. Partially observable Markov decision process (POMDP) is a representative class of non-Markovian environments in reinforcement learning. The rational policy making (RPM) algorithm learns a deterministic rational policy in POMDPs. Though RPM can learn a policy very quickly, it needs numerous trials to improve the policy. Furthermore, RPM does not apply the class where there is no deterministic rational policy. In this paper, we propose the rational policy improvement (RPI) algorithm that combines RPM and the mark transit algorithm with χ²-goodness-of-fit test. RPI can learn a deterministic or stochastic rational policy in POMDPs. RPI is applied to maze environments. We show that RPI can learn the most stable rational policy in comparison with other methods

Keywords

Markov processes; decision theory; learning (artificial intelligence); learning systems; observability; machine learning; partially observable Markov decision process; rational policy improvement algorithm; rational policy making algorithm; reinforcement learning; Ear; Economic indicators; Hardware; History; Machine learning algorithms; Proposals; Stochastic processes; Testing;

fLanguage

English

Publisher

ieee

Conference_Titel

Systems, Man, and Cybernetics, 1999. IEEE SMC '99 Conference Proceedings. 1999 IEEE International Conference on

Conference_Location

Tokyo

ISSN

1062-922X

Print_ISBN

0-7803-5731-0

Type

conf

DOI

10.1109/ICSMC.1999.815600

Filename

815600