DocumentCode :
3495191
Title :
Belief function model for reliable optimal set estimation of transition matrices in discounted infinite-horizon Markov decision processes
Author :
Li, Baohua ; Si, Jennie
Author_Institution :
Dept. of Electr. Eng., Univ. of Arkansas, Fayetteville, AK, USA
fYear :
2011
fDate :
July 31 2011-Aug. 5 2011
Firstpage :
1214
Lastpage :
1221
Abstract :
We study finite-state, finite-action, discounted infinite-horizon Markov decision processes with uncertain correlated transition matrices in deterministic policy spaces. To efficiently implement an approximate robust policy iteration algorithm for computing a robust optimal or near-optimal policy, a reliable and tight set estimate of the parameters of the transition matrix is needed in advance. However, observation samples on state transitions may be small. Prior information on the parameter space may be incomplete or unavailable. In such cases, a commonly used maximum a posterior (MAP) model may not provide a reliable optimal set estimate of the parameters. In this paper, using the advantages of Dempster-Shafer theory of evidence over Bayesian theory, a belief function model is proposed based on minimizing the cardinality of a set estimate. This new model can give a more reliable optimal solution to cover the true parameters than the MAP model. It degenerates to the MAP model when prior information on the parameter space is complete or prior information is unavailable but observation samples on state transitions are large enough. Moreover, we create a concept of principle components to characterize large observation samples so that both models result in the same reliable and tight results. The computation complexity of the new model is also discussed.
Keywords :
Markov processes; belief networks; computational complexity; decision theory; inference mechanisms; maximum likelihood estimation; Bayesian theory; Dempster-Shafer theory; approximate robust policy iteration algorithm; belief function model; computation complexity; discounted infinite-horizon Markov decision processes; maximum a posterior model; optimal set estimation; transition matrices; uncertain correlated transition matrices; Approximation algorithms; Bayesian methods; Computational modeling; Markov processes; Probability distribution; Robustness;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks (IJCNN), The 2011 International Joint Conference on
Conference_Location :
San Jose, CA
ISSN :
2161-4393
Print_ISBN :
978-1-4244-9635-8
Type :
conf
DOI :
10.1109/IJCNN.2011.6033362
Filename :
6033362
Link To Document :
بازگشت