مرکز منطقه ای اطلاع رساني علوم و فناوري - Approximate Robust Policy Iteration for Discounted Infinite-Horizon Markov Decision Processes with Uncertain Stationary Parametric Transition Matrices

DocumentCode :

1947623

Title :

Approximate Robust Policy Iteration for Discounted Infinite-Horizon Markov Decision Processes with Uncertain Stationary Parametric Transition Matrices

Author :

Li, Baohua ; Si, Jennie

Author_Institution :

Arizona State Univ., Tempe

fYear :

2007

fDate :

12-17 Aug. 2007

Firstpage :

2052

Lastpage :

2057

Abstract :

We consider Markov decision processes with finite states, finite actions, and discounted infinite-horizon cost in the deterministic policy space. State transition matrices are uncertain but with stationary parameterization. The uncertainty in transition matrices signifies realistic considerations that an accurate system model is not available for the controller design due to limitations in estimation methods and model deficiencies. Based on the quadratic total value function formulation, two approximate robust policy iterations are developed, the performance errors of which are guaranteed to be within an arbitrarily small error bound. The two approximations make use of iterative aggregation and multilayer perceptron, respectively. It is proved that the robust policy iteration based on approximation with iterative aggregation converges surely to a stationary optimal or near-optimal policy, and also that under some conditions the robust policy iteration based on approximation with multilayer perceptron converges in a probability sense to a stationary near-optimal policy. Furthermore, under some assumptions, the stationary solutions are guaranteed to be near-optimal in the deterministic policy space.

Keywords :

Markov processes; estimation theory; infinite horizon; iterative methods; matrix algebra; multilayer perceptrons; approximate robust policy iteration; controller design; deterministic policy space; discounted infinite-horizon Markov decision processes; estimation methods; iterative aggregation; multilayer perceptron; near-optimal policy; quadratic total value function formulation; state transition matrices; stationary optimal policy; stationary parameterization; stationary parametric transition matrices; Costs; Equations; Estimation error; Iterative algorithms; Iterative methods; Multilayer perceptrons; Neural networks; Robustness; Space stations; Uncertainty;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Neural Networks, 2007. IJCNN 2007. International Joint Conference on

Conference_Location :

Orlando, FL

ISSN :

1098-7576

Print_ISBN :

978-1-4244-1379-9

Electronic_ISBN :

1098-7576

Type :

conf

DOI :

10.1109/IJCNN.2007.4371274

Filename :

4371274

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1947623