Title :
Online learning in Markov decision processes with arbitrarily changing rewards and transitions
Author :
Yu, Jia Yuan ; Mannor, Shie
Author_Institution :
Dept. of Electr. & Comput. Eng., McGill Univ., Montreal, QC, Canada
Abstract :
We consider decision-making problems in Markov decision processes where both the rewards and the transition probabilities vary in an arbitrary (e.g., non-stationary) fashion. We present algorithms that combine online learning and robust control, and establish guarantees on their performance evaluated in retrospect against alternative policies-i.e., their regret. These guarantees depend critically on the range of uncertainty in the transition probabilities, but hold regardless of the changes in rewards and transition probabilities over time. We present a version of the main algorithm in the setting where the decision-maker´s observations are limited to its trajectory, and another version that allows a trade-off between performance and computational complexity.
Keywords :
Markov processes; computer aided instruction; decision making; decision theory; probability; Markov decision process; arbitrarily changing reward; decision-making; online learning; transition probability; Computational complexity; Control systems; Decision making; Dynamic programming; Game theory; Robust control; Robustness; Scholarships; Stochastic processes; Uncertainty;
Conference_Titel :
Game Theory for Networks, 2009. GameNets '09. International Conference on
Conference_Location :
Istanbul
Print_ISBN :
978-1-4244-4176-1
Electronic_ISBN :
978-1-4244-4177-8
DOI :
10.1109/GAMENETS.2009.5137416