Title :
Free energy based policy gradients
Author :
Theodorou, Evangelos A. ; Najemnik, Jiri ; Todorov, Emo
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of Washington, Seattle, WA, USA
Abstract :
Despite the plethora of reinforcement learning algorithms in machine learning and control, the majority of the work in this area relies on discrete time formulations of stochastic dynamics. In this work we present a new policy gradient algorithm for reinforcement learning in continuous state action spaces and continuous time for free energy-like cost functions. The derivation is based on successive application of Girsanov´s theorem and the use of the Radon Nikodým derivative as formulated for Markov diffusion processes. The resulting policy gradient is reward weighted. The use of Radon Nikodým extends analysis and results to more general models of stochasticity in which jump diffusions processes are considered. We apply the resulting algorithm in two simple examples for learning attractor landscapes in rhythmic and discrete movements.
Keywords :
Markov processes; continuous time systems; gradient methods; learning (artificial intelligence); nonlinear dynamical systems; Girsanov´s theorem; Markov diffusion processes; Radon Nikodým derivative; attractor landscapes; continuous state action spaces; continuous time; discrete movements; discrete time formulations; free energy based policy gradient algorithm; free energy-like cost functions; jump diffusions process; reinforcement learning; reward weighted gradient; rhythmic movements; stochastic dynamics; Cost function; Diffusion processes; Equations; Heuristic algorithms; Learning (artificial intelligence); Markov processes;
Conference_Titel :
Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2013 IEEE Symposium on
Conference_Location :
Singapore
DOI :
10.1109/ADPRL.2013.6614998