Title :
Integral Reinforcement Learning for Continuous-Time Input-Affine Nonlinear Systems With Simultaneous Invariant Explorations
Author :
Jae Young Lee ; Jin Bae Park ; Yoon Ho Choi
Author_Institution :
Dept. of Electr. & Electron. Eng., Yonsei Univ., Seoul, South Korea
Abstract :
This paper focuses on a class of reinforcement learning (RL) algorithms, named integral RL (I-RL), that solve continuous-time (CT) nonlinear optimal control problems with input-affine system dynamics. First, we extend the concepts of exploration, integral temporal difference, and invariant admissibility to the target CT nonlinear system that is governed by a control policy plus a probing signal called an exploration. Then, we show input-to-state stability (ISS) and invariant admissibility of the closed-loop systems with the policies generated by integral policy iteration (I-PI) or invariantly admissible PI (IA-PI) method. Based on these, three online I-RL algorithms named explorized I-PI and integral $Q$ -learning I, II are proposed, all of which generate the same convergent sequences as I-PI and IA-PI under the required excitation condition on the exploration. All the proposed methods are partially or completely model free, and can simultaneously explore the state space in a stable manner during the online learning processes. ISS, invariant admissibility, and convergence properties of the proposed methods are also investigated, and related with these, we show the design principles of the exploration for safe learning. Neural-network-based implementation methods for the proposed schemes are also presented in this paper. Finally, several numerical simulations are carried out to verify the effectiveness of the proposed methods.
Keywords :
closed loop systems; continuous time systems; convergence of numerical methods; iterative methods; learning (artificial intelligence); neurocontrollers; nonlinear control systems; optimal control; stability; CT nonlinear system; IA-PI method; closed-loop systems; completely model free; continuous-time input-affine nonlinear systems; continuous-time nonlinear optimal control problems; convergent sequence generation; explorized I-PI algorithm; input-affine system dynamics; input-to-state stability; integral Q-learning algorithm; integral policy iteration; integral reinforcement learning algorithm; integral temporal difference; invariantly admissible PI method; neural-network-based implementation methods; numerical simulations; partially model free; probing signal; simultaneous invariant explorations; Convergence; Equations; Heuristic algorithms; Nonlinear systems; Optimal control; Stability analysis; Adaptive optimal control; Q-learning; continuous-time (CT); exploration; policy iteration (PI); reinforcement learning (RL);
Journal_Title :
Neural Networks and Learning Systems, IEEE Transactions on
DOI :
10.1109/TNNLS.2014.2328590