Title :
Continuous Action-Space Reinforcement Learning Methods Applied to the Minimum-Time Swing-Up of the Acrobot
Author :
Barry D. Nichols
Author_Institution :
Sch. of Sci. &
Abstract :
Here I apply three reinforcement learning methods to the full, continuous action, swing-up acrobot control benchmark problem. These include two approaches from the literature: CACLA and NM-SARSA and a novel approach which I refer to as Nelder Mead-SARSA. Nelder Mead-SARSA, like NMSARSA, directly optimises the state-action value function for action selection, in order to allow continuous action reinforcement learning without a separate policy function. However, as it uses a derivative-free method it does not require the first or second partial derivatives of the value function. All three methods achieved good results in terms of swing-up times, comparable to previous approaches from the literature. Particularly Nelder Mead-SARSA, which performed the swing up in a shorter time than many approaches from the literature.
Keywords :
"Learning (artificial intelligence)","Training","Mathematical model","Switches","Reactive power","Newton method","Optimization"
Conference_Titel :
Systems, Man, and Cybernetics (SMC), 2015 IEEE International Conference on
DOI :
10.1109/SMC.2015.364