Title :
Partial-Information State-Based Optimization of Partially Observable Markov Decision Processes and the Separation Principle
Author :
Xi-Ren Cao ; De-Xin Wang ; Li Qiu
Author_Institution :
Dept. of Finance, Shanghai Jiao Tong Univ., Shanghai, China
Abstract :
We propose a partial-information state based approach to the optimization of the long-run average performance in a partially observable Markov decision process (POMDP). In this approach, the information history is summarized (at least partially) by a (or a few) statistic(s), not necessary sufficient, called a partial-information state, and actions depend on the partial-information state, rather than system states. We first propose the “single-policy based comparison principle,” under which we derive an HJB-type of optimality equation and policy iteration for the optimal policy in the partial-information-state based policy space. We then introduce the Q-sufficient statistics and show that if the partial-information state is Q-sufficient, then the optimal policy in the partial-information state based policy space is optimal in the space of all feasible information state based policies. We show that with some further conditions the well-known separation principle holds. The results are obtained by applying the direct comparison based approach initially developed for discrete event dynamic systems.
Keywords :
Markov processes; decision theory; iterative methods; optimisation; statistical analysis; HJB-type optimality equation; POMDP; Q-sufficient statistics; discrete event dynamic systems; information history; long-run average performance; partial-information state-based optimization approach; partially observable Markov decision processes; policy iteration; separation principle; single-policy based comparison principle; Equations; History; Markov processes; Mathematical model; Optimization; Probability distribution; Yttrium; Direct comparison-based approach; HJB equation; Q-factor; Q-sufficient statistics; finite state controller; performance potential; policy iteration;
Journal_Title :
Automatic Control, IEEE Transactions on
DOI :
10.1109/TAC.2013.2293397