Reinforcement Learning for Resource Allocation in LEO Satellite Networks

Author

Usaha, Wipawee ; Barria, Javier A.

Author_Institution

Sch. of Telecommun. Eng., Suranaree Univ. of Technol., Nakorn Ratchasima

Volume

37

Issue

3

fYear

2007

fDate

6/1/2007 12:00:00 AM

Firstpage

515

Lastpage

527

Abstract

In this paper, we develop and assess online decision-making algorithms for call admission and routing for low Earth orbit (LEO) satellite networks. It has been shown in a recent paper that, in a LEO satellite system, a semi-Markov decision process formulation of the call admission and routing problem can achieve better performance in terms of an average revenue function than existing routing methods. However, the conventional dynamic programming (DP) numerical solution becomes prohibited as the problem size increases. In this paper, two solution methods based on reinforcement learning (RL) are proposed in order to circumvent the computational burden of DP. The first method is based on an actor-critic method with temporal-difference (TD) learning. The second method is based on a critic-only method, called optimistic TD learning. The algorithms enhance performance in terms of requirements in storage, computational complexity and computational time, and in terms of an overall long-term average revenue function that penalizes blocked calls. Numerical studies are carried out, and the results obtained show that the RL framework can achieve up to 56% higher average revenue over existing routing methods used in LEO satellite networks with reasonable storage and computational requirements

Keywords

Markov processes; decision making; dynamic programming; learning (artificial intelligence); resource allocation; satellite communication; telecommunication computing; telecommunication congestion control; telecommunication network routing; call admission control; computational complexity; dynamic programming; low Earth orbit satellite network; online decision-making algorithm; reinforcement learning; resource allocation; semi-Markov decision process; temporal-difference learning; Bandwidth; Costs; Dynamic programming; Learning; Low earth orbit satellites; Propagation delay; Resource management; Routing; Satellite broadcasting; Topology; Call admission control (CAC); low Earth orbit (LEO) satellite network; reinforcement learning (RL); routing; temporal-difference (TD) learning; Algorithms; Artificial Intelligence; Computer Communication Networks; Decision Support Techniques; Pattern Recognition, Automated; Resource Allocation; Signal Processing, Computer-Assisted; Spacecraft;

fLanguage

English

Journal_Title

Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on

Publisher

ieee

ISSN

1083-4419

Type

jour

DOI

10.1109/TSMCB.2006.886173

Filename

4200818