مرکز منطقه ای اطلاع رساني علوم و فناوري - <formula formulatype="inline"> <img src="/images/tex/593.gif" alt="{Q}"> </formula>-Learning Algorithms for Constrained Markov Decision Processes With Randomized Monotone Policies: Application to MIMO Transmission Control

DocumentCode :

781751

Title :

${Q}$ -Learning Algorithms for Constrained Markov Decision Processes With Randomized Monotone Policies: Application to MIMO Transmission Control

Author :

Djonin, Dejan V. ; Krishnamurthy, Vikram

Author_Institution :

Dyaptive, Inc, Vancouver, BC

Volume :

Issue :

fYear :

2007

fDate :

5/1/2007 12:00:00 AM

Firstpage :

2170

Lastpage :

2181

Abstract :

This paper presents novel Q-learning based stochastic control algorithms for rate and power control in V-BLAST transmission systems. The algorithms exploit the supermodularity and monotonic structure results derived in the companion paper. Rate and power control problem is posed as a stochastic optimization problem with the goal of minimizing the average transmission power under the constraint on the average delay that can be interpreted as the quality of service requirement of a given application. Standard Q-learning algorithm is modified to handle the constraints so that it can adaptively learn structured optimal policy for unknown channel/traffic statistics. We discuss the convergence of the proposed algorithms and explore their properties in simulations. To address the issue of unknown transmission costs in an unknown time-varying environment, we propose the variant of Q-learning algorithm in which power costs are estimated in online fashion, and we show that this algorithm converges to the optimal solution as long as the power cost estimates are asymptotically unbiased

Keywords :

MIMO communication; Markov processes; adaptive control; learning systems; stochastic systems; telecommunication control; telecommunication traffic; time-varying channels; MIMO transmission control; Q-learning algorithms; V-BLAST transmission systems; channel-traffic statistics; constrained Markov decision processes; power control; randomized monotone policies; rate control; stochastic control algorithms; stochastic optimization problem; time-varying environment; Constraint optimization; Control systems; Cost function; Delay; Power control; Quality of service; Statistics; Stochastic processes; Stochastic systems; Traffic control; ${Q}$ learning; Constrained Markov decision process (CMDP); V-BLAST; delay constraints; monotone policies; randomized policies; reinforcement learning; supermodularity; transmission scheduling;

fLanguage :

English

Journal_Title :

Signal Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1053-587X

Type :

jour

DOI :

10.1109/TSP.2007.893228

Filename :

4156378

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=781751