DocumentCode :
3091545
Title :
A sample path theory for time-average Markov decision processes
Author :
Ross, K.W. ; Varadarajan, R.
Author_Institution :
University of Pennsylvania, Philadelphia, PA
Volume :
26
fYear :
1987
fDate :
9-11 Dec. 1987
Firstpage :
2264
Lastpage :
2269
Abstract :
Considered are time-average Markov Decision Processes (MDPs) with finite state and action spaces. It is shown that the state space has a natural partition into strongly communicating classes and a set of states which is transient under all stationary policies. For every policy, any associated recurrent class must be a subset of one of the strongly communicating classes; moreover, there exists a stationary policy whose recurrent classes are the strongly communicating classes. A polynomial-time algorithm is given to determine the partition. The decomposition theory is utilized to investigate MDPs with a sample-path constraint. Here, both a cost and a reward are accumulated at each decision epoch. A policy is feasible if the time-average cost is below a specified value with probability one. The optimization problem is to maximize the expected average reward over all feasible policies. For MDPs with arbitrary recurrent structures, it is shown that there exists an ??-optimal stationary policy for each ?? > 0 if and only if there exists a feasible policy. Further, verifiable conditions are given for the existence of an optimal stationary policy.
Keywords :
Constraint theory; Costs; Graph theory; Partitioning algorithms; Polynomials; State-space methods;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Decision and Control, 1987. 26th IEEE Conference on
Conference_Location :
Los Angeles, California, USA
Type :
conf
DOI :
10.1109/CDC.1987.272945
Filename :
4049710
Link To Document :
بازگشت