DocumentCode
3268656
Title
Large-scale tabular-form hardware architecture for Q-Learning with delays
Author
Liu, Zhenzehn ; Elhanany, Itamar
Author_Institution
Univ. of Tennessee, Knoxville
fYear
2007
fDate
5-8 Aug. 2007
Firstpage
827
Lastpage
830
Abstract
Q-Learning is a popular reinforcement learning algorithm which has been widely used in stochastic control applications. The bottleneck of applying tabular form Q learning in reinforcement learning problems with large scale or high dimensional action sets is the considerable delays caused by action selection and value function updates. In this paper, we present a novel hardware architecture that significantly reduces the delays. Moreover, we formulate the Q learning algorithm in cases of observation and action delays and provide a set of proofs confirming that Q-Learning with such delays converges to the optimal policy.
Keywords
delays; learning (artificial intelligence); Q-learning; delays; large-scale tabular-form hardware; optimal policy; reinforcement learning algorithm; stochastic control; Delay; Hardware; Iron; Large-scale systems;
fLanguage
English
Publisher
ieee
Conference_Titel
Circuits and Systems, 2007. MWSCAS 2007. 50th Midwest Symposium on
Conference_Location
Montreal, Que.
ISSN
1548-3746
Print_ISBN
978-1-4244-1175-7
Electronic_ISBN
1548-3746
Type
conf
DOI
10.1109/MWSCAS.2007.4488701
Filename
4488701
Link To Document