DocumentCode
2226545
Title
Direct gradient-based reinforcement learning
Author
Baxter, Jonathan ; Bartlett, Peter L.
Author_Institution
Res. Sch. of Inf. Sci. & Eng., Australian Nat. Univ., Canberra, ACT, Australia
Volume
3
fYear
2000
fDate
2000
Firstpage
271
Abstract
Many control, scheduling, planning and game-playing tasks can be formulated as reinforcement learning problems, in which an agent chooses actions to take in some environment, aiming to maximize a reward function. We present an algorithm for computing approximations to the gradient of the average reward from a single sample path of a controlled partially observable Markov decision process. We show that the accuracy of these approximations depends on the relationship between a time constant used by the algorithm and the mixing time of the Markov chain, and that the error can be made arbitrarily small by setting the time constant suitably large. We prove that the algorithm converges with probability 1
Keywords
Markov processes; game theory; gradient methods; learning (artificial intelligence); probability; Markov chain; agent; direct gradient-based reinforcement learning; game-playing tasks; mixing time; partially observable Markov decision process; probability; reward function; sample path; time constant; Adaptive control; Approximation algorithms; Convergence; Discrete event systems; Equations; Learning; Probability distribution; Processor scheduling; State-space methods; Stochastic processes;
fLanguage
English
Publisher
ieee
Conference_Titel
Circuits and Systems, 2000. Proceedings. ISCAS 2000 Geneva. The 2000 IEEE International Symposium on
Conference_Location
Geneva
Print_ISBN
0-7803-5482-6
Type
conf
DOI
10.1109/ISCAS.2000.856049
Filename
856049
Link To Document