• DocumentCode
    2226545
  • Title

    Direct gradient-based reinforcement learning

  • Author

    Baxter, Jonathan ; Bartlett, Peter L.

  • Author_Institution
    Res. Sch. of Inf. Sci. & Eng., Australian Nat. Univ., Canberra, ACT, Australia
  • Volume
    3
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    271
  • Abstract
    Many control, scheduling, planning and game-playing tasks can be formulated as reinforcement learning problems, in which an agent chooses actions to take in some environment, aiming to maximize a reward function. We present an algorithm for computing approximations to the gradient of the average reward from a single sample path of a controlled partially observable Markov decision process. We show that the accuracy of these approximations depends on the relationship between a time constant used by the algorithm and the mixing time of the Markov chain, and that the error can be made arbitrarily small by setting the time constant suitably large. We prove that the algorithm converges with probability 1
  • Keywords
    Markov processes; game theory; gradient methods; learning (artificial intelligence); probability; Markov chain; agent; direct gradient-based reinforcement learning; game-playing tasks; mixing time; partially observable Markov decision process; probability; reward function; sample path; time constant; Adaptive control; Approximation algorithms; Convergence; Discrete event systems; Equations; Learning; Probability distribution; Processor scheduling; State-space methods; Stochastic processes;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Circuits and Systems, 2000. Proceedings. ISCAS 2000 Geneva. The 2000 IEEE International Symposium on
  • Conference_Location
    Geneva
  • Print_ISBN
    0-7803-5482-6
  • Type

    conf

  • DOI
    10.1109/ISCAS.2000.856049
  • Filename
    856049