مرکز منطقه ای اطلاع رساني علوم و فناوري - On a staged successive approximation procedure for approximate dynamic programming solutions to board game playing

Abstract :

In board game playing, many endgames may be solved based on dynamic programming (DP) with value-to-go functions that evaluate each state towards final winning positions. We first identify all positions to win in one move; next, recognize the next set of positions to win in two moves; and then do so in three moves, and so on. In this way, the sets of constant values (moving towards winning) are produced sequentially. This observation motivates our interest in what class of board game problems does the computation of starting points of constant value work efficiently. Furthermore, certain situations may arise repeatedly during the play; this corresponds to a problem where some values on the right-hand side of Bellman´s DP optimality equation are unknown when the specific value function on the left-hand side is evaluated. To resolve this issue, one may typically employ Bellman´s successive approximation procedure directly on a whole large problem, but we could exploit the aforementioned observation; that is, we organize the procedure in a sequentially stagewise fashion, starting with an initial smallest sub-problem to final outcome; then, advance to the next smallest local sub-problem, using approximate solutions of the previous sub-problems. Proceed in this manner until the desired problem scale is reached. We demonstrate this local-search idea (called staged successive approximation procedures) in a two-player board game found in the literature.