DocumentCode
847996
Title
Reinforcement Learning for Resource Allocation in LEO Satellite Networks
Author
Usaha, Wipawee ; Barria, Javier A.
Author_Institution
Sch. of Telecommun. Eng., Suranaree Univ. of Technol., Nakorn Ratchasima
Volume
37
Issue
3
fYear
2007
fDate
6/1/2007 12:00:00 AM
Firstpage
515
Lastpage
527
Abstract
In this paper, we develop and assess online decision-making algorithms for call admission and routing for low Earth orbit (LEO) satellite networks. It has been shown in a recent paper that, in a LEO satellite system, a semi-Markov decision process formulation of the call admission and routing problem can achieve better performance in terms of an average revenue function than existing routing methods. However, the conventional dynamic programming (DP) numerical solution becomes prohibited as the problem size increases. In this paper, two solution methods based on reinforcement learning (RL) are proposed in order to circumvent the computational burden of DP. The first method is based on an actor-critic method with temporal-difference (TD) learning. The second method is based on a critic-only method, called optimistic TD learning. The algorithms enhance performance in terms of requirements in storage, computational complexity and computational time, and in terms of an overall long-term average revenue function that penalizes blocked calls. Numerical studies are carried out, and the results obtained show that the RL framework can achieve up to 56% higher average revenue over existing routing methods used in LEO satellite networks with reasonable storage and computational requirements
Keywords
Markov processes; decision making; dynamic programming; learning (artificial intelligence); resource allocation; satellite communication; telecommunication computing; telecommunication congestion control; telecommunication network routing; call admission control; computational complexity; dynamic programming; low Earth orbit satellite network; online decision-making algorithm; reinforcement learning; resource allocation; semi-Markov decision process; temporal-difference learning; Bandwidth; Costs; Dynamic programming; Learning; Low earth orbit satellites; Propagation delay; Resource management; Routing; Satellite broadcasting; Topology; Call admission control (CAC); low Earth orbit (LEO) satellite network; reinforcement learning (RL); routing; temporal-difference (TD) learning; Algorithms; Artificial Intelligence; Computer Communication Networks; Decision Support Techniques; Pattern Recognition, Automated; Resource Allocation; Signal Processing, Computer-Assisted; Spacecraft;
fLanguage
English
Journal_Title
Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on
Publisher
ieee
ISSN
1083-4419
Type
jour
DOI
10.1109/TSMCB.2006.886173
Filename
4200818
Link To Document