Title of article :
Error bounds for constant step-size -learning
Author/Authors :
Beck، نويسنده , , C.L. and Srikant، نويسنده , , R.، نويسنده ,
Issue Information :
ماهنامه با شماره پیاپی سال 2012
Abstract :
We provide a bound on the first moment of the error in the Q -function estimate resulting from fixed step-size algorithms applied to finite state-space, discounted reward Markov decision problems. Motivated by Tsitsiklis’ proof for the decreasing step-size case, we decompose the Q -learning update equations into a dynamical system driven by a noise sequence and another dynamical system whose state variable is the Q -learning error, i.e., the difference between the true Q -function and the estimate. A natural persistence of excitation condition allows us to sample the system periodically and derive a simple scalar difference equation from which the convergence properties and bounds on the error of the Q -learning algorithm can be derived.
Keywords :
Markov decision processes , Stochastic approximation , Q -learning
Journal title :
Systems and Control Letters
Journal title :
Systems and Control Letters