• DocumentCode
    574341
  • Title

    Online Markov decision processes with Kullback-Leibler control cost

  • Author

    Peng Guan ; Raginsky, Maxim ; Willett, Rebecca

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Duke Univ., Durham, NC, USA
  • fYear
    2012
  • fDate
    27-29 June 2012
  • Firstpage
    1388
  • Lastpage
    1393
  • Abstract
    We consider an online (real-time) control problem that involves an agent performing a discrete-time random walk over a finite state space. The agent´s action at each time step is to specify the probability distribution for the next state given the current state. Following the set-up of Todorov (2007, 2009), the state-action cost at each time step is a sum of a nonnegative state cost and a control cost given by the Kullback-Leibler divergence between the agent´s next-state distribution and that determined by some fixed passive dynamics. The online aspect of the problem is due to the fact that the state cost functions are generated by a dynamic environment, and the agent learns the current state cost only after having selected the corresponding action. We give an explicit construction of an efficient strategy that has small regret (i.e., the difference between the total state-action cost incurred causally and the smallest cost attainable using noncausal knowledge of the state costs) under mild regularity conditions on the passive dynamics. We demonstrate the performance of our proposed strategy on a simulated target tracking problem.
  • Keywords
    Markov processes; decision making; learning (artificial intelligence); multi-agent systems; state-space methods; statistical distributions; Kullback-Leibler control cost; Kullback-Leibler divergence; discrete-time random walk; finite state space; fixed passive dynamics; noncausal knowledge; nonnegative state cost; online Markov decision processes; online real-time control problem; probability distribution; sequential decision-making; simulated target tracking problem; total state-action cost; Aerospace electronics; Cost function; Markov processes; Probability distribution; State feedback; Steady-state; Target tracking;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    American Control Conference (ACC), 2012
  • Conference_Location
    Montreal, QC
  • ISSN
    0743-1619
  • Print_ISBN
    978-1-4577-1095-7
  • Electronic_ISBN
    0743-1619
  • Type

    conf

  • DOI
    10.1109/ACC.2012.6314926
  • Filename
    6314926