DocumentCode
3723128
Title
ACIS: An Improved Actor-Critic Method for POMDPs with Internal State
Author
Dan Xu;Quan Liu
Author_Institution
Comput. Sci. &
fYear
2015
Firstpage
369
Lastpage
376
Abstract
Partially observable Markov decision processes (POMDPs) provide a rich mathematical model for sequential decision making in partially observable and stochastic environments. Model-free methods use the internal state as a substitute of the belief state which is a sufficient statistic of all past action-observation history in model-based techniques. A main drawback of previous model-free techniques, such as direct policy gradient methods, is that their solutions often suffer the high variance of the gradient estimate. This paper proposes a novel algorithm, Actor-Critic with Internal State (ACIS) to reduce the variance by using the policy gradient methods. ACIS gets its power by using the AC framework which updates the parameters of the policy functions in the actor part and uses the temporal difference to estimate the current policy in the critic part. Empirically, ACIS shows better performance than state-of-the-art model-free methods, such as IState-GPOMDP, in terms of the variance and final reward on the Load-Unload and Robot Navigation problems.
Keywords
"Mathematical model","Gradient methods","Convergence","Markov processes","History","Load modeling","Computer science"
Publisher
ieee
Conference_Titel
Tools with Artificial Intelligence (ICTAI), 2015 IEEE 27th International Conference on
ISSN
1082-3409
Type
conf
DOI
10.1109/ICTAI.2015.63
Filename
7372159
Link To Document