• DocumentCode
    3091545
  • Title

    A sample path theory for time-average Markov decision processes

  • Author

    Ross, K.W. ; Varadarajan, R.

  • Author_Institution
    University of Pennsylvania, Philadelphia, PA
  • Volume
    26
  • fYear
    1987
  • fDate
    9-11 Dec. 1987
  • Firstpage
    2264
  • Lastpage
    2269
  • Abstract
    Considered are time-average Markov Decision Processes (MDPs) with finite state and action spaces. It is shown that the state space has a natural partition into strongly communicating classes and a set of states which is transient under all stationary policies. For every policy, any associated recurrent class must be a subset of one of the strongly communicating classes; moreover, there exists a stationary policy whose recurrent classes are the strongly communicating classes. A polynomial-time algorithm is given to determine the partition. The decomposition theory is utilized to investigate MDPs with a sample-path constraint. Here, both a cost and a reward are accumulated at each decision epoch. A policy is feasible if the time-average cost is below a specified value with probability one. The optimization problem is to maximize the expected average reward over all feasible policies. For MDPs with arbitrary recurrent structures, it is shown that there exists an ??-optimal stationary policy for each ?? > 0 if and only if there exists a feasible policy. Further, verifiable conditions are given for the existence of an optimal stationary policy.
  • Keywords
    Constraint theory; Costs; Graph theory; Partitioning algorithms; Polynomials; State-space methods;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Decision and Control, 1987. 26th IEEE Conference on
  • Conference_Location
    Los Angeles, California, USA
  • Type

    conf

  • DOI
    10.1109/CDC.1987.272945
  • Filename
    4049710