DocumentCode
1476046
Title
Universal Reinforcement Learning
Author
Farias, Vivek F. ; Moallemi, Ciamac C. ; Van Roy, Benjamin ; Weissman, Tsachy
Author_Institution
Sloan Sch. of Manage., Massachusetts Inst. of Technol., Cambridge, MA, USA
Volume
56
Issue
5
fYear
2010
fDate
5/1/2010 12:00:00 AM
Firstpage
2441
Lastpage
2454
Abstract
We consider an agent interacting with an unmodeled environment. At each time, the agent makes an observation, takes an action, and incurs a cost. Its actions can influence future observations and costs. The goal is to minimize the long-term average cost. We propose a novel algorithm, known as the active LZ algorithm, for optimal control based on ideas from the Lempel-Ziv scheme for universal data compression and prediction. We establish that, under the active LZ algorithm, if there exists an integer K such that the future is conditionally independent of the past given a window of K consecutive actions and observations, then the average cost converges to the optimum. Experimental results involving the game of Rock-Paper-Scissors illustrate merits of the algorithm.
Keywords
data compression; dynamic programming; learning (artificial intelligence); optimal control; Lempel-Ziv scheme; active LZ algorithm; agent interaction; consecutive actions; consecutive observations; data compression; optimal control; prediction; rock-paper-scissors game; universal reinforcement learning; Australia; Cost function; Data compression; Dynamic programming; Engineering management; History; Information theory; Learning; Optimal control; Technology management; Context tree; Lempel-Ziv; dynamic programming; optimal control; reinforcement learning; value iteration;
fLanguage
English
Journal_Title
Information Theory, IEEE Transactions on
Publisher
ieee
ISSN
0018-9448
Type
jour
DOI
10.1109/TIT.2010.2043762
Filename
5452185
Link To Document