• DocumentCode
    3268656
  • Title

    Large-scale tabular-form hardware architecture for Q-Learning with delays

  • Author

    Liu, Zhenzehn ; Elhanany, Itamar

  • Author_Institution
    Univ. of Tennessee, Knoxville
  • fYear
    2007
  • fDate
    5-8 Aug. 2007
  • Firstpage
    827
  • Lastpage
    830
  • Abstract
    Q-Learning is a popular reinforcement learning algorithm which has been widely used in stochastic control applications. The bottleneck of applying tabular form Q learning in reinforcement learning problems with large scale or high dimensional action sets is the considerable delays caused by action selection and value function updates. In this paper, we present a novel hardware architecture that significantly reduces the delays. Moreover, we formulate the Q learning algorithm in cases of observation and action delays and provide a set of proofs confirming that Q-Learning with such delays converges to the optimal policy.
  • Keywords
    delays; learning (artificial intelligence); Q-learning; delays; large-scale tabular-form hardware; optimal policy; reinforcement learning algorithm; stochastic control; Delay; Hardware; Iron; Large-scale systems;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Circuits and Systems, 2007. MWSCAS 2007. 50th Midwest Symposium on
  • Conference_Location
    Montreal, Que.
  • ISSN
    1548-3746
  • Print_ISBN
    978-1-4244-1175-7
  • Electronic_ISBN
    1548-3746
  • Type

    conf

  • DOI
    10.1109/MWSCAS.2007.4488701
  • Filename
    4488701