Title :
Finite horizon Markov control with one-step variance penalties
Author_Institution :
Dept. of Eng. Manage. & Syst. Eng., Missouri Univ. of Sci. & Technol., Rolla, MO, USA
fDate :
Sept. 29 2010-Oct. 1 2010
Abstract :
Variance-penalized Markov decision processes (MDPs) for an infinite time horizon have been studied in the literature for asymptotic and one-step variance; in these models, the objective function is generally the expected long-run reward minus a constant times the variance, where variance is used as a measure of risk. For the finite time horizon, asymptotic variance has been considered in Collins, but this model accounts for only a terminal reward, i.e., reward is earned at the end of the time horizon. In this paper, we seek to develop a framework for one-step variance in the finite time horizon in which rewards can be non-zero in every state. We develop a solution algorithm based on the stochastic shortest path algorithm of Bertsekas and Tsitsiklis. We also present a Q-Learning algorithm for a simulation-based scenario which applies in the absence of the transition probability model, along with some preliminary convergence results.
Keywords :
Markov processes; convergence; graph theory; infinite horizon; learning (artificial intelligence); probability; MDP; Q-learning algorithm; asymptotic variance; finite horizon Markov control; infinite time horizon; long-run reward; objective function; one-step variance penalty; preliminary convergence results; risk measurement; simulation-based scenario; solution algorithm; stochastic shortest path algorithm; terminal reward; transition probability model; variance-penalized Markov decision processes; Convergence; Equations; Industries; Learning; Markov processes; Mathematical model;
Conference_Titel :
Communication, Control, and Computing (Allerton), 2010 48th Annual Allerton Conference on
Conference_Location :
Allerton, IL
Print_ISBN :
978-1-4244-8215-3
DOI :
10.1109/ALLERTON.2010.5707071