Dynamic Optimization and Learning for Renewal Systems

Author

Neely, Michael J.

Author_Institution

Electr. Eng. Dept., Univ. of Southern California, Los Angeles, CA, USA

Volume

58

Issue

1

fYear

2013

fDate

Jan. 2013

Firstpage

32

Lastpage

46

Abstract

This paper considers optimization of time averages in systems with variable length renewal frames. Applications include power-aware and profit-aware scheduling in wireless networks, peer-to-peer networks, and transportation systems. Every frame, a new policy is implemented that affects the frame size and that creates a vector of attributes. The policy can be a single decision in response to a random event observed on the frame, or a sequence of such decisions. The goal is to choose policies on each frame in order to maximize a time average of one attribute, subject to additional time average constraints on the others. Two algorithms are developed, both based on Lyapunov optimization concepts. The first makes decisions to minimize a “drift-plus-penalty” ratio over each frame. The second is similar but does not involve a ratio. For systems that make only a single decision on each frame, both algorithms are shown to learn efficient behavior without a-priori statistical knowledge. The framework is also applicable to large classes of constrained Markov decision problems. Such problems are reduced to finding an approximate solution to a simpler unconstrained stochastic shortest path problem on each frame. Approximations for the simpler problem may still suffer from a curse of dimensionality for systems with large state space. However, our results are compatible with any approximation method, and demonstrate an explicit tradeoff between performance and convergence time.

Keywords

Lyapunov methods; Markov processes; decision making; optimisation; stochastic systems; Lyapunov optimization concepts; constrained Markov decision problems; drift-plus-penalty ratio; dynamic optimization; optimization; peer-to-peer networks; power-aware scheduling; profit-aware scheduling; renewal systems; time average constraints; time averages; transportation systems; unconstrained stochastic shortest path problem; variable length renewal frames; wireless networks; Algorithm design and analysis; Approximation methods; Convergence; Heuristic algorithms; Optimization; Protocols; Vectors; Markov decision problems (MDPs); stochastic processes;

fLanguage

English

Journal_Title

Automatic Control, IEEE Transactions on

Publisher

ieee

ISSN

0018-9286

Type

jour

DOI

10.1109/TAC.2012.2204831

Filename

6218174