Author_Institution :
Shenzhen Grad. Sch., Harbin Inst. of Technol., Shenzhen, China
Abstract :
Performance potential plays an important role in the Markov decision process (MDP) with the discounted- or average-reward criteria. With performance potential as building block, optimization algorithms, such as, policy iteration algorithms and gradient-based algorithms can be developed. Generally, performance potential can be obtained by solving linear equation. However, when state space is very large or transition probabilities are unknown, the solution of performance potential becomes difficult, even impossible. At that cases, the simulation-based estimation is more suitable. Regular Monte Carlo estimates have a variance of O(1/N), where N is the number of sample pathes of the Markov chains. In this paper, we consider a new estimation algorithm of average reward performance potential with geometric variance reduction. The estimates with geometric variance reduction O(ρN) with ρ <; 1 have better convergence rate. By using the relative difference of performance potential, i.e., perturbation realization factor, performance potential can be estimated based on a coupling method, which can further reduce the variance of estimation. The estimation of performance potential in this paper can be applied in the event-based optimization.
Keywords :
Markov processes; convergence; decision theory; estimation theory; optimisation; Markov chains; Markov decision process; Monte Carlo estimates; average reward performance potential; average reward performance potential estimation; average-reward criteria; convergence rate; coupling method; discounted-reward criteria; event-based optimization; geometric variance reduction; linear equation; optimization algorithms; perturbation realization factor; simulation-based estimation; state space; transition probabilities; Couplings; Equations; Estimation; Markov processes; Mathematical model; Optimization; Poisson equations; Estimation with Geometric Variance Reduction; Performance Potential; Perturbation Realization Factor;