• DocumentCode
    3268719
  • Title

    A parallel architecture for temporal difference learning with eligibility traces

  • Author

    Turnmire, J. ; Elhanany, I.

  • fYear
    2007
  • fDate
    5-8 Aug. 2007
  • Firstpage
    848
  • Lastpage
    850
  • Abstract
    Temporal difference learning is a central idea in reinforcement learning, commonly employed by a broad range of applications, in which there are delayed rewards. An agent learns by interacting with its environment and constructs a value function which helps map states to actions. A particularly useful tool in temporal difference learning is eligibility traces. The latter assist the agent in assigning values to states recently visited. This paper explores the gain attainable by utilizing custom hardware to take advantage of the inherent parallelism found in the TD(lambda) algorithm. The result is a scalable framework for high-speed machine learning applications. To the best of that authors´ knowledge, this is the first work that attempts to map tabular-form temporal difference learning with eligibility traces on to digital hardware.
  • Keywords
    learning (artificial intelligence); parallel architectures; digital hardware; eligibility trace; machine learning; parallel architecture; reinforcement learning; tabular-form temporal difference learning; Broadcasting; Hardware; Learning; Linear feedback shift registers; Logic; Parallel architectures; Parallel processing; Performance evaluation; Random sequences; State estimation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Circuits and Systems, 2007. MWSCAS 2007. 50th Midwest Symposium on
  • Conference_Location
    Montreal, Que.
  • ISSN
    1548-3746
  • Print_ISBN
    978-1-4244-1175-7
  • Electronic_ISBN
    1548-3746
  • Type

    conf

  • DOI
    10.1109/MWSCAS.2007.4488705
  • Filename
    4488705