• DocumentCode
    1476046
  • Title

    Universal Reinforcement Learning

  • Author

    Farias, Vivek F. ; Moallemi, Ciamac C. ; Van Roy, Benjamin ; Weissman, Tsachy

  • Author_Institution
    Sloan Sch. of Manage., Massachusetts Inst. of Technol., Cambridge, MA, USA
  • Volume
    56
  • Issue
    5
  • fYear
    2010
  • fDate
    5/1/2010 12:00:00 AM
  • Firstpage
    2441
  • Lastpage
    2454
  • Abstract
    We consider an agent interacting with an unmodeled environment. At each time, the agent makes an observation, takes an action, and incurs a cost. Its actions can influence future observations and costs. The goal is to minimize the long-term average cost. We propose a novel algorithm, known as the active LZ algorithm, for optimal control based on ideas from the Lempel-Ziv scheme for universal data compression and prediction. We establish that, under the active LZ algorithm, if there exists an integer K such that the future is conditionally independent of the past given a window of K consecutive actions and observations, then the average cost converges to the optimum. Experimental results involving the game of Rock-Paper-Scissors illustrate merits of the algorithm.
  • Keywords
    data compression; dynamic programming; learning (artificial intelligence); optimal control; Lempel-Ziv scheme; active LZ algorithm; agent interaction; consecutive actions; consecutive observations; data compression; optimal control; prediction; rock-paper-scissors game; universal reinforcement learning; Australia; Cost function; Data compression; Dynamic programming; Engineering management; History; Information theory; Learning; Optimal control; Technology management; Context tree; Lempel-Ziv; dynamic programming; optimal control; reinforcement learning; value iteration;
  • fLanguage
    English
  • Journal_Title
    Information Theory, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9448
  • Type

    jour

  • DOI
    10.1109/TIT.2010.2043762
  • Filename
    5452185