Title :
Online Synchronous Policy Iteration based on Concurrent Learning to solve continuous-time optimal control problem
Author :
Haitao Wang;Dongbin Zhao;Chengdong Li
Author_Institution :
The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
fDate :
4/1/2015 12:00:00 AM
Abstract :
In this paper, a novel online Synchronous Policy Iteration algorithm based on Concurrent Learning (CLSPI) is presented to solve the continuous-time optimal control problem. We design this algorithm based on actor-critic architecture. Original Synchronous Policy Iteration (SPI) algorithm just updates parameters of actor-critic simultaneously, which is rather different to the way standard policy iteration updates. In the scheme of SPI, only current estimation error is utilized to update weights while previous information can also contribute to weights update. Concurrent learning is a new parameters estimation method which combines previous information and current estimation error. CLSPI utilizes concurrent learning to train SPI, in order to improve the learning performance. Finally, two comparison experiments including linear and nonlinear systems are presented to demonstrate that CLSPI can obtain the optimal control policy with faster convergence rate, which shows that CLSPI is more efficient to solve continuous-time optimal control problems.
Keywords :
"Artificial neural networks","Computational intelligence"
Conference_Titel :
Information Science and Technology (ICIST), 2015 5th International Conference on
DOI :
10.1109/ICIST.2015.7288986