Title :
Convergence of the Q-ae learning under deterministic MDPs and its efficiency under the stochastic environment
Author :
Zhao, Gang ; Sun, Ruoying ; Tatsumi, Shoji
Author_Institution :
Fujitsu Kansai-Chubu Net-Tech Ltd., Osaka, Japan
Abstract :
Reinforcement learning (RL) is an efficient method for solving Markov decision processes (MDPs) without any priori knowledge about an environment. Q-learning is a representative RL. Though it is guaranteed to derive the optimal policy, Q-learning needs numerous trials to learn the optimal policy. By the use of the feature of Q value, this paper presents an accelerated RL method, the Q-ae learning. Further, utilizing the dynamic programming principle, this paper proves the convergence to the optimal policy of the Q-ae learning under deterministic MDPs. The analytical and simulation results illustrate the efficiencies of the Q-ae learning under deterministic and stochastic MDPs
Keywords :
Markov processes; decision theory; dynamic programming; learning (artificial intelligence); Q-ae learning; convergence; deterministic Markov decision processes; dynamic programming; optimal policy; reinforcement learning; simulation; stochastic Markov decision processes; Acceleration; Analytical models; Convergence; Dynamic programming; Probability distribution; State-space methods; Stochastic processes; Sun; Time factors;
Conference_Titel :
Systems, Man, and Cybernetics, 2000 IEEE International Conference on
Conference_Location :
Nashville, TN
Print_ISBN :
0-7803-6583-6
DOI :
10.1109/ICSMC.2000.884985