Title :
A parallel architecture for temporal difference learning with eligibility traces
Author :
Turnmire, J. ; Elhanany, I.
Abstract :
Temporal difference learning is a central idea in reinforcement learning, commonly employed by a broad range of applications, in which there are delayed rewards. An agent learns by interacting with its environment and constructs a value function which helps map states to actions. A particularly useful tool in temporal difference learning is eligibility traces. The latter assist the agent in assigning values to states recently visited. This paper explores the gain attainable by utilizing custom hardware to take advantage of the inherent parallelism found in the TD(lambda) algorithm. The result is a scalable framework for high-speed machine learning applications. To the best of that authors´ knowledge, this is the first work that attempts to map tabular-form temporal difference learning with eligibility traces on to digital hardware.
Keywords :
learning (artificial intelligence); parallel architectures; digital hardware; eligibility trace; machine learning; parallel architecture; reinforcement learning; tabular-form temporal difference learning; Broadcasting; Hardware; Learning; Linear feedback shift registers; Logic; Parallel architectures; Parallel processing; Performance evaluation; Random sequences; State estimation;
Conference_Titel :
Circuits and Systems, 2007. MWSCAS 2007. 50th Midwest Symposium on
Conference_Location :
Montreal, Que.
Print_ISBN :
978-1-4244-1175-7
Electronic_ISBN :
1548-3746
DOI :
10.1109/MWSCAS.2007.4488705