DocumentCode
31440
Title
Optimization of Average Rewards of Time Nonhomogeneous Markov Chains
Author
Xi-Ren Cao
Author_Institution
Dept. of Finance, Shanghai Jiao Tong Univ., Shanghai, China
Volume
60
Issue
7
fYear
2015
fDate
Jul-15
Firstpage
1841
Lastpage
1856
Abstract
We study the optimization of average rewards of discrete time nonhomogeneous Markov chains, in which the state spaces, transition probabilities, and reward functions depend on time. The analysis encounters a few major difficulties: 1) Notions crucial to homogeneous Markov chains, such as ergodicity, stationarity, periodicity, and connectivity, no longer apply; 2) The average reward criterion is under-selective; i.e, it does not depend on the decisions in any finite period, and thus dynamic programming is not amenable; and 3) Because of the under-selectivity, an optimal average-reward policy may not be the best in any finite period. These issues are resolved by 1) We discover that a new notion, called “confluencity”, is the base for optimization of average rewards of Markov chains. Confluencity refers to the property that two independent sample paths of a Markov chain starting from any two different initial states will eventually meet together; 2) We apply the direct-comparison based approach [3] to the average reward optimization and obtain the necessary and sufficient conditions for optimal policies; and 3) We study the bias optimality with bias measuring the transient reward; we show that for the transient reward to be optimal, one additional condition based on bias potentials is required.
Keywords
Markov processes; discrete time systems; dynamic programming; probability; state-space methods; average reward optimization; bias optimality; bias potential; confluencity; discrete time nonhomogeneous Markov chain; dynamic programming; necessary and sufficient condition; optimal average-reward policy; optimal policy; reward function; state space; transient reward; transition probability; Couplings; Dynamic programming; Equations; IEEE Potentials; Markov processes; Optimization; Transient analysis; Bias optimality; Confluencity; HJB equation; bias potential; confluencity; direct-comparison based optimization; performance potential; weak ergodicity; weak recurrence;
fLanguage
English
Journal_Title
Automatic Control, IEEE Transactions on
Publisher
ieee
ISSN
0018-9286
Type
jour
DOI
10.1109/TAC.2015.2394951
Filename
7017536
Link To Document