Title :
Integral Reinforcement Learning for online computation of feedback Nash strategies of nonzero-sum differential games
Author :
Vrabie, Draguna ; Lewis, Frank
Author_Institution :
Autom. & Robot. Res. Inst., Univ. of Texas at Arlington, Fort Worth, TX, USA
Abstract :
This paper presents an Approximate/Adaptive Dynamic Programming (ADP) algorithm that finds online the Nash equilibrium for two-player nonzero-sum differential games with linear dynamics and infinite horizon quadratic cost. Each of the game players is using the procedure of Integral Reinforcement Learning (IRL) to calculate online the infinite horizon value function that it associates with every given set of feedback control policies. It will be shown that the online algorithm is mathematically equivalent to an offline iterative method, previously introduced in the literature, that solves the set of coupled algebraic Riccati equations (ARE) underlying the game problem using complete knowledge on the system dynamics. Here we show how the ADP techniques will enhance the capabilities of the offline method allowing an online solution without the requirement of complete knowledge of the system dynamics. The two participants in the continuous-time differential game are competing in real-time and the feedback Nash control strategies will be determined based on online measured data from the system. The algorithm is built on interplay between a learning phase, where each of the players is learning online the value that they associate with a given set of play policies, and a policy update step, performed by each of the payers towards decreasing the value of their cost. The players are learning concurrently. The feasibility of the ADP scheme is demonstrated in simulation.
Keywords :
Riccati equations; differential games; dynamic programming; iterative methods; learning (artificial intelligence); approximate-adaptive dynamic programming algorithm; coupled algebraic Riccati equations; feedback Nash strategies; infinite horizon quadratic cost; infinite horizon value function; integral reinforcement learning; linear dynamics; offline iterative method; online computation; two-player nonzero-sum differential games; Cost function; Games; Heuristic algorithms; Infinite horizon; Learning; Nash equilibrium;
Conference_Titel :
Decision and Control (CDC), 2010 49th IEEE Conference on
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4244-7745-6
DOI :
10.1109/CDC.2010.5718152