• DocumentCode
    31440
  • Title

    Optimization of Average Rewards of Time Nonhomogeneous Markov Chains

  • Author

    Xi-Ren Cao

  • Author_Institution
    Dept. of Finance, Shanghai Jiao Tong Univ., Shanghai, China
  • Volume
    60
  • Issue
    7
  • fYear
    2015
  • fDate
    Jul-15
  • Firstpage
    1841
  • Lastpage
    1856
  • Abstract
    We study the optimization of average rewards of discrete time nonhomogeneous Markov chains, in which the state spaces, transition probabilities, and reward functions depend on time. The analysis encounters a few major difficulties: 1) Notions crucial to homogeneous Markov chains, such as ergodicity, stationarity, periodicity, and connectivity, no longer apply; 2) The average reward criterion is under-selective; i.e, it does not depend on the decisions in any finite period, and thus dynamic programming is not amenable; and 3) Because of the under-selectivity, an optimal average-reward policy may not be the best in any finite period. These issues are resolved by 1) We discover that a new notion, called “confluencity”, is the base for optimization of average rewards of Markov chains. Confluencity refers to the property that two independent sample paths of a Markov chain starting from any two different initial states will eventually meet together; 2) We apply the direct-comparison based approach [3] to the average reward optimization and obtain the necessary and sufficient conditions for optimal policies; and 3) We study the bias optimality with bias measuring the transient reward; we show that for the transient reward to be optimal, one additional condition based on bias potentials is required.
  • Keywords
    Markov processes; discrete time systems; dynamic programming; probability; state-space methods; average reward optimization; bias optimality; bias potential; confluencity; discrete time nonhomogeneous Markov chain; dynamic programming; necessary and sufficient condition; optimal average-reward policy; optimal policy; reward function; state space; transient reward; transition probability; Couplings; Dynamic programming; Equations; IEEE Potentials; Markov processes; Optimization; Transient analysis; Bias optimality; Confluencity; HJB equation; bias potential; confluencity; direct-comparison based optimization; performance potential; weak ergodicity; weak recurrence;
  • fLanguage
    English
  • Journal_Title
    Automatic Control, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9286
  • Type

    jour

  • DOI
    10.1109/TAC.2015.2394951
  • Filename
    7017536