مرکز منطقه ای اطلاع رساني علوم و فناوري - Online Markov Decision Processes With Kullback

DocumentCode :

67559

Title :

Online Markov Decision Processes With Kullback–Leibler Control Cost

Author :

Peng Guan ; Raginsky, Maxim ; Willett, Rebecca M.

Author_Institution :

Dept. of Electr. & Comput. Eng., Duke Univ., Durham, NC, USA

Volume :

Issue :

fYear :

2014

fDate :

Jun-14

Firstpage :

1423

Lastpage :

1438

Abstract :

This paper considers an online (real-time) control problem that involves an agent performing a discrete-time random walk over a finite state space. The agent´s action at each time step is to specify the probability distribution for the next state given the current state. Following the setup of Todorov, the state-action cost at each time step is a sum of a state cost and a control cost given by the Kullback-Leibler (KL) divergence between the agent´s next-state distribution and that determined by some fixed passive dynamics. The online aspect of the problem is due to the fact that the state cost functions are generated by a dynamic environment, and the agent learns the current state cost only after selecting an action. An explicit construction of a computationally efficient strategy with small regret (i.e., expected difference between its actual total cost and the smallest cost attainable using noncausal knowledge of the state costs) under mild regularity conditions is presented, along with a demonstration of the performance of the proposed strategy on a simulated target tracking problem. A number of new results on Markov decision processes with KL control cost are also obtained.

Keywords :

Markov processes; discrete time systems; learning systems; KL divergence; Kullback-Leibler control cost; Kullback-Leibler divergence; agent next-state distribution; agent passive dynamics; discrete-time random walk; mild regularity conditions; online Markov decision process; online control problem; probability distribution; simulated target tracking problem; state cost functions; state-action cost; Aerospace electronics; Cost function; Entropy; Markov processes; Probability distribution; State feedback; Target tracking; Markov decision processes; online learning; stochastic control;

fLanguage :

English

Journal_Title :

Automatic Control, IEEE Transactions on

Publisher :

ieee

ISSN :

0018-9286

Type :

jour

DOI :

10.1109/TAC.2014.2301558

Filename :

6716965

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=67559