• DocumentCode
    493377
  • Title

    Kalman Temporal Differences: The deterministic case

  • Author

    Geist, Matthieu ; Pietquin, Olivier ; Fricout, Gabriel

  • Author_Institution
    IMS Res. Group, Metz
  • fYear
    2009
  • fDate
    March 30 2009-April 2 2009
  • Firstpage
    185
  • Lastpage
    192
  • Abstract
    This paper deals with value function and Q-function approximation in deterministic Markovian decision processes. A general statistical framework based on the Kalman filtering paradigm is introduced. Its principle is to adopt a parametric representation of the value function, to model the associated parameter vector as a random variable and to minimize the mean-squared error of the parameters conditioned on past observed transitions. From this general framework, which will be called Kalman Temporal Differences (KTD), and using an approximation scheme called the unscented transform, a family of algorithms is derived, namely KTD-V, KTD-SARSA and KTD-Q, which aim respectively at estimating the value function of a given policy, the Q-function of a given policy and the optimal Q-function. The proposed approach holds for linear and nonlinear parameterization. This framework is discussed and potential advantages and shortcomings are highlighted.
  • Keywords
    Kalman filters; Markov processes; approximation theory; mean square error methods; random processes; temporal reasoning; Kalman filtering paradigm; Kalman temporal differences; deterministic Markovian decision processes; function approximation; mean-squared error; nonlinear parameterization; random variable; unscented transform; Approximation algorithms; Dynamic programming; Equations; Error correction; Filtering; Kalman filters; Learning; Random variables; State-space methods; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Adaptive Dynamic Programming and Reinforcement Learning, 2009. ADPRL '09. IEEE Symposium on
  • Conference_Location
    Nashville, TN
  • Print_ISBN
    978-1-4244-2761-1
  • Type

    conf

  • DOI
    10.1109/ADPRL.2009.4927543
  • Filename
    4927543