Abstract :
The author formulates and solves a dynamic stochastic optimization problem of a nonstandard type, whose optimal solution features active learning. The proof of optimality and the derivation of the corresponding control policies is an indirect one, which relates the original single-person optimization problem to a sequence of nested zero-sum stochastic games. Existence of saddle points for these games implies the existence of optimal policies for the original control problem, which, in turn, can be obtained from the solution of a nonlinear deterministic, optimal control problem. The author also studies the problem of existence of stationary optimal policies when the time horizon is infinite and the objective function is discounted
Keywords :
game theory; learning systems; optimal control; optimisation; stochastic systems; active learning; dynamic stochastic optimization; nested zero-sum stochastic games; nonstandard stochastic control; optimal control; saddle points; Control design; Control systems; Displays; Filtering theory; Infinite horizon; Noise measurement; Optimal control; Performance analysis; Stochastic processes; Stochastic resonance;