مرکز منطقه ای اطلاع رساني علوم و فناوري - Dynamic programming approach to a final-value control system with a random variable having an unknown distribution function

Abstract :

As described in the introduction of this paper, the state vector xn, representing the present state of a sampling control system, is assumed to satisfy a difference equationx_{n+1} = T(x_{n}, r_{n}, upsilon_{n}), whereupsilon_{n}is the control vector of the system subject to random disturbances rn. The random variables rnare assumed to be independent of each other and to be defined in a parameter space in the following manner: Nature is assumed to be in one of a finite number,q, of possible states and each state has its own parameter value, thus specifying uniquely and unequivocally the distribution function of rn. It is further assumed that we are given the a priori probabilityz =(z_{1}, . . . , z_{q})of each possible state of nature,i= 1,2, . . . , q,summin{i=1}max{q} z_{i}=1, z_{i} geq 0. Given a criterion of performance, the duration of the process and the domain of the control variable, a sequence of control variables {upsilon_{n}} is to be determined as a function of the state vector of the system and time, so as to optimize the performance. The sequential nature of the determination ofupsilon_{n}is evident here, since the stochastic nature of the problem prevents specification of such a sequence ofupsilon´s as a function of the initial state and time. By means ot the functional equation technique of dynamic programming, a recurrence relation of the criterion function of the process,k_{n}(x, z), is derived, wherexis the state variable of the system when there remainncontrol stages. Whenqis taken to be two and the criterion of performance to hex_{N}^{2}with the constraint on the control variablessummin{i=0}max{N-1} upsilon_{i}^{2} leq K, where xnis the final state vector of the system, it is shown thatk_{n}(x, z)is the formw_{n}(z)x^{2}and the optimalupsilon_{n}(x, z)is linear inx, as might be expected. The dependence ofk_{n}(x, z)andupsilon_{n}(x, z)onzis investigated further, andw_{n}(z)is found to he concave inz. Explicit quadrat- ic forms forw_{n}(z)are obtained in the neighborhood ofz = 0and 1. The optimalupsilon_{n}(x, z)is found to be monotonically decreasing inz. When the domain ofupsilon_{n}is restricted to a finite set of values, as in contactor servo systems, no explicit expressions fork_{n}(x, z)andupsilon_{n}(x, z)are available. However,k_{n}(x, z)is still concave inzand approximately given byzk_{n}(x, 1) + (1 - z)k_{n}(x, 0). By solving the recurrence relation numerically, this approximation is found to be very good for moderately large n, say 10. This means that if one has explicit solutions forp =p_{1}andp = P_{2}, then one can approximate the criterion function for the adaptive case wherePr(p =p_{1})= zby a linear function inzas shown above. This provides fairly good lower bound onk_{n}(x, z), and in turn serves to determine an initial approximate policy. A concept of suboptimal policy is introduced and numerical experiments are performed to test suboptimal policy suggested from numerical solution. The system behavior under the suboptimal policy is also discussed.