مرکز منطقه ای اطلاع رساني علوم و فناوري - Data-driven schemes for resolving misspecified MDPs: Asymptotics and error analysis

DocumentCode :

3747031

Title :

Data-driven schemes for resolving misspecified MDPs: Asymptotics and error analysis

Author :

Hao Jiang;Uday V. Shanbhag

Author_Institution :

Industrial and Enterprise Systems Engineering, University of Illinois at Urbana-Champaign, 104 S. Mathews Ave., 61801, USA

fYear :

2015

Firstpage :

3801

Lastpage :

3812

Abstract :

We consider the solution of a finite-state infinite horizon Markov Decision Process (MDP) in which both the transition matrix and the cost function are misspecified, the latter in a parametric sense. We consider a data-driven regime in which the learning problem is a stochastic convex optimization problem that resolves misspecification. Via such a framework, we make the following contributions: (1) We first show that a misspecified value iteration scheme converges almost surely to its true counterpart and the mean-squared error after K iterations is O(1/K^1/2-α) with 0 <; α <; 1/2; (2) An analogous asymptotic almost-sure convergence statement is provided for misspecified policy iteration; and (3) Finally, we present a constant steplength misspecified Q-learning scheme and show that a suitable error metric is O(1/K^1/2-α) + O(√δ) with 0 <; α <; 1/2 after K iterations where δ is a bound on the steplength.

Keywords :

"Convergence","Cost function","Markov processes","Modeling","Error analysis"

Publisher :

ieee

Conference_Titel :

Winter Simulation Conference (WSC), 2015

Electronic_ISBN :

1558-4305

Type :

conf

DOI :

10.1109/WSC.2015.7408537

Filename :

7408537

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3747031