DocumentCode :
859767
Title :
Dynamic programming approach to a final-value control system with a random variable having an unknown distribution function
Author :
Aoki, Masanao
Author_Institution :
University of California, Los Angeles, CA, USA
Volume :
5
Issue :
4
fYear :
1960
fDate :
9/1/1960 12:00:00 AM
Firstpage :
270
Lastpage :
283
Abstract :
As described in the introduction of this paper, the state vector xn, representing the present state of a sampling control system, is assumed to satisfy a difference equationx_{n+1} = T(x_{n}, r_{n}, upsilon_{n}), whereupsilon_{n}is the control vector of the system subject to random disturbances rn. The random variables rnare assumed to be independent of each other and to be defined in a parameter space in the following manner: Nature is assumed to be in one of a finite number,q, of possible states and each state has its own parameter value, thus specifying uniquely and unequivocally the distribution function of rn. It is further assumed that we are given the a priori probabilityz =(z_{1}, . . . , z_{q})of each possible state of nature,i= 1,2, . . . , q,summin{i=1}max{q} z_{i}=1, z_{i} geq 0. Given a criterion of performance, the duration of the process and the domain of the control variable, a sequence of control variables {upsilon_{n}} is to be determined as a function of the state vector of the system and time, so as to optimize the performance. The sequential nature of the determination ofupsilon_{n}is evident here, since the stochastic nature of the problem prevents specification of such a sequence ofupsilon´s as a function of the initial state and time. By means ot the functional equation technique of dynamic programming, a recurrence relation of the criterion function of the process,k_{n}(x, z), is derived, wherexis the state variable of the system when there remainncontrol stages. Whenqis taken to be two and the criterion of performance to hex_{N}^{2}with the constraint on the control variablessummin{i=0}max{N-1} upsilon_{i}^{2} leq K, where xnis the final state vector of the system, it is shown thatk_{n}(x, z)is the formw_{n}(z)x^{2}and the optimalupsilon_{n}(x, z)is linear inx, as might be expected. The dependence ofk_{n}(x, z)andupsilon_{n}(x, z)onzis investigated further, andw_{n}(z)is found to he concave inz. Explicit quadrat- ic forms forw_{n}(z)are obtained in the neighborhood ofz = 0and 1. The optimalupsilon_{n}(x, z)is found to be monotonically decreasing inz. When the domain ofupsilon_{n}is restricted to a finite set of values, as in contactor servo systems, no explicit expressions fork_{n}(x, z)andupsilon_{n}(x, z)are available. However,k_{n}(x, z)is still concave inzand approximately given byzk_{n}(x, 1) + (1 - z)k_{n}(x, 0). By solving the recurrence relation numerically, this approximation is found to be very good for moderately large n, say 10. This means that if one has explicit solutions forp =p_{1}andp = P_{2}, then one can approximate the criterion function for the adaptive case wherePr(p =p_{1})= zby a linear function inzas shown above. This provides fairly good lower bound onk_{n}(x, z), and in turn serves to determine an initial approximate policy. A concept of suboptimal policy is introduced and numerical experiments are performed to test suboptimal policy suggested from numerical solution. The system behavior under the suboptimal policy is also discussed.
Keywords :
Control systems; Difference equations; Distribution functions; Dynamic programming; Optimal control; Random variables; Sampling methods; Servomechanisms; Stochastic processes; Vectors;
fLanguage :
English
Journal_Title :
Automatic Control, IRE Transactions on
Publisher :
ieee
ISSN :
0096-199X
Type :
jour
DOI :
10.1109/TAC.1960.1105020
Filename :
1105020
Link To Document :
بازگشت