DocumentCode :
3268719
Title :
A parallel architecture for temporal difference learning with eligibility traces
Author :
Turnmire, J. ; Elhanany, I.
fYear :
2007
fDate :
5-8 Aug. 2007
Firstpage :
848
Lastpage :
850
Abstract :
Temporal difference learning is a central idea in reinforcement learning, commonly employed by a broad range of applications, in which there are delayed rewards. An agent learns by interacting with its environment and constructs a value function which helps map states to actions. A particularly useful tool in temporal difference learning is eligibility traces. The latter assist the agent in assigning values to states recently visited. This paper explores the gain attainable by utilizing custom hardware to take advantage of the inherent parallelism found in the TD(lambda) algorithm. The result is a scalable framework for high-speed machine learning applications. To the best of that authors´ knowledge, this is the first work that attempts to map tabular-form temporal difference learning with eligibility traces on to digital hardware.
Keywords :
learning (artificial intelligence); parallel architectures; digital hardware; eligibility trace; machine learning; parallel architecture; reinforcement learning; tabular-form temporal difference learning; Broadcasting; Hardware; Learning; Linear feedback shift registers; Logic; Parallel architectures; Parallel processing; Performance evaluation; Random sequences; State estimation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Circuits and Systems, 2007. MWSCAS 2007. 50th Midwest Symposium on
Conference_Location :
Montreal, Que.
ISSN :
1548-3746
Print_ISBN :
978-1-4244-1175-7
Electronic_ISBN :
1548-3746
Type :
conf
DOI :
10.1109/MWSCAS.2007.4488705
Filename :
4488705
Link To Document :
بازگشت