• DocumentCode
    391043
  • Title

    Gradient-based policy iteration: an example

  • Author

    Cao, Xi-Ren ; Fang, Hai-Tao

  • Author_Institution
    Dept. of Electr. & Electron. Eng., Hong Kong Univ. of Sci. & Technol., China
  • Volume
    3
  • fYear
    2002
  • fDate
    10-13 Dec. 2002
  • Firstpage
    3367
  • Abstract
    Research indicates that perturbation analysis (PA), Markov decision processes (MDP), and reinforcement learning (RL) are three closely-related areas in discrete event dynamic system optimization. In particular, it was shown that policy iteration in fact chooses the policy that has the steepest performance gradient (provided by PA) for the next iteration. This sensitivity point of view of MDP leads to some new research topics. We propose to implement policy iteration based on performance gradients. This approach is particularly useful when the actions at different states are correlated and hence the standard policy iteration cannot apply. We illustrate the main ideas with an example of an M/G/1/N queue and identify some further research topics.
  • Keywords
    Markov processes; decision theory; discrete event systems; gradient methods; iterative methods; learning (artificial intelligence); probability; queueing theory; M/G/1/N queue; Markov decision processes; Q-learning; W-factors; discrete event dynamic system optimization; gradient-based policy iteration; performance gradients; perturbation analysis; reinforcement learning; sensitivity; Control systems; Convergence; Laboratories; Mathematics; Optimization; Performance analysis; Poisson equations; Stochastic processes; System performance; User-generated content;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Decision and Control, 2002, Proceedings of the 41st IEEE Conference on
  • ISSN
    0191-2216
  • Print_ISBN
    0-7803-7516-5
  • Type

    conf

  • DOI
    10.1109/CDC.2002.1184395
  • Filename
    1184395