Adaptive proportional fair parameterization based LTE scheduling using continuous actor-critic reinforcement learning

Author

Comsa, Ioan Sorin ; Sijing Zhang ; Aydin, Mehmet ; Jianping Chen ; Kuonen, Pierre ; Wagen, Jean-Frederic

Author_Institution

Inst. for Res. in Applicable Comput., Univ. of Bedfordshire, Luton, UK

fYear

2014

fDate

8-12 Dec. 2014

Firstpage

4387

Lastpage

4393

Abstract

Maintaining a desired trade-off performance between system throughput maximization and user fairness satisfaction constitutes a problem that is still far from being solved. In LTE systems, different tradeoff levels can be obtained by using a proper parameterization of the Generalized Proportional Fair (GPF) scheduling rule. Our approach is able to find the best parameterization policy that maximizes the system throughput under different fairness constraints imposed by the scheduler state. The proposed method adapts and refines the policy at each Transmission Time Interval (TTI) by using the Multi-Layer Perceptron Neural Network (MLPNN) as a non-linear function approximation between the continuous scheduler state and the optimal GPF parameter(s). The MLPNN function generalization is trained based on Continuous Actor-Critic Learning Automata Reinforcement Learning (CACLA RL). The double GPF parameterization optimization problem is addressed by using CACLA RL with two continuous actions (CACLA-2). Five reinforcement learning algorithms as simple parameterization techniques are compared against the novel technology. Simulation results indicate that CACLA-2 performs much better than any of other candidates that adjust only one scheduling parameter such as CACLA-1. CACLA-2 outperforms CACLA-1 by reducing the percentage of TTIs when the system is considered unfair. Being able to attenuate the fluctuations of the obtained policy, CACLA-2 achieves enhanced throughput gain when severe changes in the scheduling environment occur, maintaining in the same time the fairness optimality condition.

Keywords

Long Term Evolution; approximation theory; learning (artificial intelligence); learning automata; multilayer perceptrons; optimisation; telecommunication computing; telecommunication scheduling; CACLA RL; GPF scheduling rule; LTE scheduling; MLPNN; TTI; adaptive proportional fair parameterization optimization problem; continuous actor-critic reinforcement learning; generalized proportional fair scheduling rule; multilayer perceptron neural network; nonlinear function approximation; system throughput maximization; transmission time interval; user fairness satisfaction; Aerospace electronics; Measurement; Optimal scheduling; Throughput; Training; Wireless communication; CACLA-1; CACLA-2; CQI; GPF; LTE-A; MLPNN; RL; TTI; fairness; policy; scheduling rule; throughput;

fLanguage

English

Publisher

ieee

Conference_Titel

Global Communications Conference (GLOBECOM), 2014 IEEE

Conference_Location

Austin, TX

Type

conf

DOI

10.1109/GLOCOM.2014.7037498

Filename

7037498

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=266757