Title :
Data-driven schemes for resolving misspecified MDPs: Asymptotics and error analysis
Author :
Hao Jiang;Uday V. Shanbhag
Author_Institution :
Industrial and Enterprise Systems Engineering, University of Illinois at Urbana-Champaign, 104 S. Mathews Ave., 61801, USA
Abstract :
We consider the solution of a finite-state infinite horizon Markov Decision Process (MDP) in which both the transition matrix and the cost function are misspecified, the latter in a parametric sense. We consider a data-driven regime in which the learning problem is a stochastic convex optimization problem that resolves misspecification. Via such a framework, we make the following contributions: (1) We first show that a misspecified value iteration scheme converges almost surely to its true counterpart and the mean-squared error after K iterations is O(1/K1/2-α) with 0 <; α <; 1/2; (2) An analogous asymptotic almost-sure convergence statement is provided for misspecified policy iteration; and (3) Finally, we present a constant steplength misspecified Q-learning scheme and show that a suitable error metric is O(1/K1/2-α) + O(√δ) with 0 <; α <; 1/2 after K iterations where δ is a bound on the steplength.
Keywords :
"Convergence","Cost function","Markov processes","Modeling","Error analysis"
Conference_Titel :
Winter Simulation Conference (WSC), 2015
Electronic_ISBN :
1558-4305
DOI :
10.1109/WSC.2015.7408537