Title :
Universal Reinforcement Learning
Author :
Farias, Vivek F. ; Moallemi, Ciamac C. ; Van Roy, Benjamin ; Weissman, Tsachy
Author_Institution :
Sloan Sch. of Manage., Massachusetts Inst. of Technol., Cambridge, MA, USA
fDate :
5/1/2010 12:00:00 AM
Abstract :
We consider an agent interacting with an unmodeled environment. At each time, the agent makes an observation, takes an action, and incurs a cost. Its actions can influence future observations and costs. The goal is to minimize the long-term average cost. We propose a novel algorithm, known as the active LZ algorithm, for optimal control based on ideas from the Lempel-Ziv scheme for universal data compression and prediction. We establish that, under the active LZ algorithm, if there exists an integer K such that the future is conditionally independent of the past given a window of K consecutive actions and observations, then the average cost converges to the optimum. Experimental results involving the game of Rock-Paper-Scissors illustrate merits of the algorithm.
Keywords :
data compression; dynamic programming; learning (artificial intelligence); optimal control; Lempel-Ziv scheme; active LZ algorithm; agent interaction; consecutive actions; consecutive observations; data compression; optimal control; prediction; rock-paper-scissors game; universal reinforcement learning; Australia; Cost function; Data compression; Dynamic programming; Engineering management; History; Information theory; Learning; Optimal control; Technology management; Context tree; Lempel-Ziv; dynamic programming; optimal control; reinforcement learning; value iteration;
Journal_Title :
Information Theory, IEEE Transactions on
DOI :
10.1109/TIT.2010.2043762