مرکز منطقه ای اطلاع رساني علوم و فناوري - ACIS: An Improved Actor-Critic Method for POMDPs with Internal State

Abstract :

Partially observable Markov decision processes (POMDPs) provide a rich mathematical model for sequential decision making in partially observable and stochastic environments. Model-free methods use the internal state as a substitute of the belief state which is a sufficient statistic of all past action-observation history in model-based techniques. A main drawback of previous model-free techniques, such as direct policy gradient methods, is that their solutions often suffer the high variance of the gradient estimate. This paper proposes a novel algorithm, Actor-Critic with Internal State (ACIS) to reduce the variance by using the policy gradient methods. ACIS gets its power by using the AC framework which updates the parameters of the policy functions in the actor part and uses the temporal difference to estimate the current policy in the critic part. Empirically, ACIS shows better performance than state-of-the-art model-free methods, such as IState-GPOMDP, in terms of the variance and final reward on the Load-Unload and Robot Navigation problems.