Neural-network based online policy iteration for continuous-time infinite-horizon optimal control of nonlinear systems

Author

Difan Tang ; Lei Chen ; Zhao Feng Tian

Author_Institution

Sch. of Mech. Eng., Univ. of Adelaide, Adelaide, SA, Australia

fYear

2015

fDate

12-15 July 2015

Firstpage

792

Lastpage

796

Abstract

A new policy-iteration algorithm based on neural networks (NNs) is proposed in this paper to synthesize optimal control laws online for continuous-time nonlinear systems. Latest advances in this field have enabled synchronous policy iteration but require an additional tuning loop or a logic switch mechanism to maintain system stability. A new algorithm is thus derived in this paper to address this limitation. The optimal control law is found by solving the Hamilton-Jacobi-Bellman (HJB) equation for the associated value function via synchronous policy iteration in a critic-actor configuration. As a major contribution, a new form of NN approximation for the value function is proposed, offering the closed-loop system asymptotic stability without additional tuning scheme or logic switch mechanism. As a second contribution, an extended Kalman filter is introduced to estimate the critic NN parameters for fast convergence. The efficacy of the new algorithm is verified by simulations.

Keywords

Kalman filters; closed loop systems; continuous time systems; control system synthesis; infinite horizon; neurocontrollers; nonlinear control systems; nonlinear filters; optimal control; stability; HJB equation; Hamilton-Jacobi-Bellman equation; NN approximation; NNs; associated value function; closed-loop system asymptotic stability; continuous-time infinite-horizon optimal control; continuous-time nonlinear systems; critic NN parameter estimation; critic-actor configuration; extended Kalman filter; logic switch mechanism; neural networks; online policy iteration; optimal control law synthesis; synchronous policy iteration; system stability; tuning loop; Approximation methods; Decision support systems; Dynamic programming; Markov processes; Radio frequency; Robustness; TV; machine learning; neural network; nonlinear system; optimal control; policy iteration;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal and Information Processing (ChinaSIP), 2015 IEEE China Summit and International Conference on

Conference_Location

Chengdu

Type

conf

DOI

10.1109/ChinaSIP.2015.7230513

Filename

7230513