DocumentCode
349969
Title
Proposal for an algorithm to improve a rational policy in POMDPs
Author
Miyazaki, Kazuteru ; Kobayashi, Shigenobu
Author_Institution
Int. Grad. Sch. of Sci. & Eng., Tokyo Inst. of Technol., Yokohama, Japan
Volume
5
fYear
1999
fDate
1999
Firstpage
492
Abstract
Reinforcement learning is a kind of machine learning. Partially observable Markov decision process (POMDP) is a representative class of non-Markovian environments in reinforcement learning. The rational policy making (RPM) algorithm learns a deterministic rational policy in POMDPs. Though RPM can learn a policy very quickly, it needs numerous trials to improve the policy. Furthermore, RPM does not apply the class where there is no deterministic rational policy. In this paper, we propose the rational policy improvement (RPI) algorithm that combines RPM and the mark transit algorithm with χ2-goodness-of-fit test. RPI can learn a deterministic or stochastic rational policy in POMDPs. RPI is applied to maze environments. We show that RPI can learn the most stable rational policy in comparison with other methods
Keywords
Markov processes; decision theory; learning (artificial intelligence); learning systems; observability; machine learning; partially observable Markov decision process; rational policy improvement algorithm; rational policy making algorithm; reinforcement learning; Ear; Economic indicators; Hardware; History; Machine learning algorithms; Proposals; Stochastic processes; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Systems, Man, and Cybernetics, 1999. IEEE SMC '99 Conference Proceedings. 1999 IEEE International Conference on
Conference_Location
Tokyo
ISSN
1062-922X
Print_ISBN
0-7803-5731-0
Type
conf
DOI
10.1109/ICSMC.1999.815600
Filename
815600
Link To Document