• DocumentCode
    1984366
  • Title

    The policy gradient estimation of continuous-time hidden Markov decision processes

  • Author

    Li Yanjie ; Baoqun, Yin ; Hongsheng, Xi

  • Author_Institution
    Dept. of Autom., Univ. of Sci. & Technol. of China, China
  • fYear
    2005
  • fDate
    27 June-3 July 2005
  • Abstract
    Recently, gradient based methods have received much attention to optimize some dynamic systems with hidden information, such as routing problems of robotic systems. In this paper, we presented a process - continuous time hidden Markov decision process (CTHMDP), which can be used to model the robotic systems. For this process, the problem of policy gradient estimation is studied. Firstly, an approximation formula to the gradient is presented, then by using the uniformization method, we introduce an algorithm, which can be considered as an extension of gradient of partially observable Markov decision process (GPOMDP) algorithm to the continue time model. Finally, the convergence and error bound of the algorithm are considered.
  • Keywords
    continuous time systems; decision theory; discrete event systems; estimation theory; gradient methods; hidden Markov models; robots; continue time model; dynamic system; gradient based method; gradient of partially observable Markov decision process algorithm; hidden Markov decision process; policy gradient estimation; robotic system; uniformization method; Approximation algorithms; Convergence; Cost function; Function approximation; Hidden Markov models; Learning systems; Optimization methods; Probability distribution; Robotics and automation; Robots;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Acquisition, 2005 IEEE International Conference on
  • Print_ISBN
    0-7803-9303-1
  • Type

    conf

  • DOI
    10.1109/ICIA.2005.1635101
  • Filename
    1635101