• DocumentCode
    3723128
  • Title

    ACIS: An Improved Actor-Critic Method for POMDPs with Internal State

  • Author

    Dan Xu;Quan Liu

  • Author_Institution
    Comput. Sci. &
  • fYear
    2015
  • Firstpage
    369
  • Lastpage
    376
  • Abstract
    Partially observable Markov decision processes (POMDPs) provide a rich mathematical model for sequential decision making in partially observable and stochastic environments. Model-free methods use the internal state as a substitute of the belief state which is a sufficient statistic of all past action-observation history in model-based techniques. A main drawback of previous model-free techniques, such as direct policy gradient methods, is that their solutions often suffer the high variance of the gradient estimate. This paper proposes a novel algorithm, Actor-Critic with Internal State (ACIS) to reduce the variance by using the policy gradient methods. ACIS gets its power by using the AC framework which updates the parameters of the policy functions in the actor part and uses the temporal difference to estimate the current policy in the critic part. Empirically, ACIS shows better performance than state-of-the-art model-free methods, such as IState-GPOMDP, in terms of the variance and final reward on the Load-Unload and Robot Navigation problems.
  • Keywords
    "Mathematical model","Gradient methods","Convergence","Markov processes","History","Load modeling","Computer science"
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence (ICTAI), 2015 IEEE 27th International Conference on
  • ISSN
    1082-3409
  • Type

    conf

  • DOI
    10.1109/ICTAI.2015.63
  • Filename
    7372159