DocumentCode
3268719
Title
A parallel architecture for temporal difference learning with eligibility traces
Author
Turnmire, J. ; Elhanany, I.
fYear
2007
fDate
5-8 Aug. 2007
Firstpage
848
Lastpage
850
Abstract
Temporal difference learning is a central idea in reinforcement learning, commonly employed by a broad range of applications, in which there are delayed rewards. An agent learns by interacting with its environment and constructs a value function which helps map states to actions. A particularly useful tool in temporal difference learning is eligibility traces. The latter assist the agent in assigning values to states recently visited. This paper explores the gain attainable by utilizing custom hardware to take advantage of the inherent parallelism found in the TD(lambda) algorithm. The result is a scalable framework for high-speed machine learning applications. To the best of that authors´ knowledge, this is the first work that attempts to map tabular-form temporal difference learning with eligibility traces on to digital hardware.
Keywords
learning (artificial intelligence); parallel architectures; digital hardware; eligibility trace; machine learning; parallel architecture; reinforcement learning; tabular-form temporal difference learning; Broadcasting; Hardware; Learning; Linear feedback shift registers; Logic; Parallel architectures; Parallel processing; Performance evaluation; Random sequences; State estimation;
fLanguage
English
Publisher
ieee
Conference_Titel
Circuits and Systems, 2007. MWSCAS 2007. 50th Midwest Symposium on
Conference_Location
Montreal, Que.
ISSN
1548-3746
Print_ISBN
978-1-4244-1175-7
Electronic_ISBN
1548-3746
Type
conf
DOI
10.1109/MWSCAS.2007.4488705
Filename
4488705
Link To Document