Optimal adaptive control for unknown systems using output feedback by reinforcement learning methods

Author

Lewis, F.L. ; Vamvoudakis, Kyriakos G.

Author_Institution

Autom. & Robot. Res. Inst., Univ. of Texas at Arlington, Fort Worth, TX, USA

fYear

2010

fDate

9-11 June 2010

Firstpage

2138

Lastpage

2145

Abstract

Optimal feedback controllers are generally computed offline assuming full knowledge of the system dynamics. Adaptive controllers, on the other hand, are online schemes that effectively learn to compensate for unknown system dynamics and disturbances. Generally, direct adaptive schemes do not converge to optimal control solutions for user-prescribed performance measures. During the past years, it has been shown that reinforcement learning techniques from computational intelligence can be used to learn optimal feedback controllers online using direct adaptive control techniques without knowing the system dynamics. Most reinforcement learning methods require full measurements of the system internal state. In this paper we develop reinforcement learning methods which require only output feedback and yet converge to an optimal controller. Deterministic linear time-invariant systems are considered. Both policy iteration (PI) and value iteration (VI) algorithms are derived. This corresponds to optimal control for a class of partially observable Markov decision processes (POMDPs). It is shown that, similar to Q-learning, the new output-feedback optimal learning methods have the important advantage that knowledge of the system dynamics is not needed for their implementation. Only the order of the system must be known and an upper bound on its ‘observability index’. The learned output feedback controller is in the form of a polynomial ARMA controller that has equivalent performance with the optimal state variable feedback gain.

Keywords

Adaptive control; Computational intelligence; Control systems; Learning systems; Optimal control; Output feedback; Polynomials; Programmable control; State feedback; Upper bound; Output feedback Approximate Dynamic Programming; Policy Iteration; Value Iteration;

fLanguage

English

Publisher

ieee

Conference_Titel

Control and Automation (ICCA), 2010 8th IEEE International Conference on

Conference_Location

Xiamen, China

ISSN

1948-3449

Print_ISBN

978-1-4244-5195-1

Electronic_ISBN

1948-3449

Type

conf

DOI

10.1109/ICCA.2010.5524211

Filename

5524211