مرکز منطقه ای اطلاع رساني علوم و فناوري - A novel Q-learning algorithm with function approximation for constrained Markov decision processes

DocumentCode :

1629388

Title :

A novel Q-learning algorithm with function approximation for constrained Markov decision processes

Author :

Lakshmanan, K. ; Bhatnagar, Shalabh

Author_Institution :

Dept. of Comput. Sci. & Autom., Indian Inst. of Sci., Bangalore, India

fYear :

2012

Firstpage :

400

Lastpage :

405

Abstract :

We present a novel multi-timescale Q-learning algorithm for average cost control in a Markov decision process subject to multiple inequality constraints. We formulate a relaxed version of this problem through the Lagrange multiplier method. Our algorithm is different from Q-learning in that it updates two parameters - a Q-value parameter and a policy parameter. The Q-value parameter is updated on a slower time scale as compared to the policy parameter. Whereas Q-learning with function approximation can diverge in some cases, our algorithm is seen to be convergent as a result of the aforementioned timescale separation. We show the results of experiments on a problem of constrained routing in a multistage queueing network. Our algorithm is seen to exhibit good performance and the various inequality constraints are seen to be satisfied upon convergence of the algorithm.

Keywords :

Markov processes; decision theory; function approximation; learning (artificial intelligence); network theory (graphs); parameter estimation; queueing theory; Lagrange multiplier method; Q-value parameter; average cost control; constrained Markov decision processes; constrained routing problem; function approximation; inequality constraints; multistage queueing network; multitimescale Q-learning algorithm; parameter update; policy parameter; Approximation algorithms; Function approximation; Markov processes; Minimization; Routing; Vectors; Zinc; Constrained MDP; Lagrange multiplier method; Q-learning with linear function approximation; multi-stage stochastic shortest path problem; reinforcement learning;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Communication, Control, and Computing (Allerton), 2012 50th Annual Allerton Conference on

Conference_Location :

Monticello, IL

Print_ISBN :

978-1-4673-4537-8

Type :

conf

DOI :

10.1109/Allerton.2012.6483246

Filename :

6483246

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1629388