Title :
Bias-corrected Q-learning to control max-operator bias in Q-learning
Author :
Lee, Daewoo ; Defourny, Boris ; Powell, Warren B.
Author_Institution :
Dept. of Comput. Sci., Princeton Univ., Princeton, NJ, USA
Abstract :
We identify a class of stochastic control problems with highly random rewards and high discount factor which induce high levels of statistical error in the estimated action-value function. This produces significant levels of max-operator bias in Q-learning, which can induce the algorithm to diverge for millions of iterations. We present a bias-corrected Q-learning algorithm with asymptotically unbiased resistance against the max-operator bias, and show that the algorithm asymptotically converges to the optimal policy, as Q-learning does. We show experimentally that bias-corrected Q-learning performs well in a domain with highly random rewards where Q-learning and other related algorithms suffer from the max-operator bias.
Keywords :
learning (artificial intelligence); stochastic systems; action-value function estimation; asymptotically unbiased resistance; bias-corrected Q-learning algorithm; discount factor; max-operator bias control; optimal policy; statistical error; stochastic control problems; Convergence; Dynamic programming; Educational institutions; Learning (artificial intelligence); Random variables; Reactive power; Standards;
Conference_Titel :
Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2013 IEEE Symposium on
Conference_Location :
Singapore
DOI :
10.1109/ADPRL.2013.6614994