Title :
Iterated approximate value functions
Author :
O´Donoghue, Brendan ; Yang Wang ; Boyd, Stephen
Abstract :
In this paper we introduce a control policy which we refer to as the iterated approximate value function policy. The generation of this policy requires two stages, the first one carried out off-line, and the second stage carried out on-line. In the first stage we simultaneously compute a trajectory of moments of the state and action and a sequence of approximate value functions optimized to that trajectory. The next stage is to perform control using the generated sequence of approximate value functions. This yields a time-varying policy, even in the case where the optimal policy is time-invariant. We restrict our attention to the case with linear dynamics and quadratically representable stage cost function. In this case the pre-computation stage requires the solution of a semidefinite program (SDP). Finding the control action at each time-period requires solving a small convex optimization problem which can be carried out quickly. We conclude with some examples.
Keywords :
convex programming; iterative methods; optimal control; stochastic systems; SDP; infinite horizon discounted stochastic control problem; iterated approximate value function policy; iterated approximate value functions; linear dynamics; optimal control policy; semidefinite programming; small convex optimization problem; stage cost function; time-invariant system; time-varying policy; Approximation methods; Convex functions; Dynamic programming; Noise; Optimization; Trajectory; Vectors;
Conference_Titel :
Control Conference (ECC), 2013 European
Conference_Location :
Zurich