Title :
Infinite-Horizon Policy-Gradient Estimation with Variable Discount Factor for Markov Decision Process
Author :
Bao, Bing-Kun ; Yin, Bao-Qun ; Xi, Hong-sheng
Author_Institution :
Dept. of Autom., China Univ. of Sci. & Technol., Hefei
Abstract :
A novel infinite-horizon policy-gradient estimation method with variable discount factor is proposed in this paper. This method tackles the normal policy-gradient estimation methods´ limitations on unbalance of the bias and variance by using an incremental sequence as the discount factor. Numerical experiments conducted on the Markov decision process have shown its effectiveness.
Keywords :
Markov processes; decision theory; gradient methods; infinite horizon; Markov decision process; incremental sequence; infinite-horizon policy-gradient estimation; variable discount factor; Approximation algorithms; Automation; Computational modeling; Eigenvalues and eigenfunctions; Optimization methods; State estimation; State-space methods; Stochastic processes;
Conference_Titel :
Innovative Computing Information and Control, 2008. ICICIC '08. 3rd International Conference on
Conference_Location :
Dalian, Liaoning
Print_ISBN :
978-0-7695-3161-8
Electronic_ISBN :
978-0-7695-3161-8
DOI :
10.1109/ICICIC.2008.318