مرکز منطقه ای اطلاع رساني علوم و فناوري - Improving the Accuracy of the Cases in the Automatic Case Elicitation-Based Hybrid Agents for Checkers

Abstract :

This work proposes the improvement of the calculus for the rating of cases generated in the context of hybrid player agents that conciliate Automatic Case Elicitation and static problem solvers. The system used as a benchmark is the agent for Checkers called ACE-RL-Checkers. This agent is a hybrid system that combines the best abilities from the automatic Checkers players CHEBR and LS-VisionDraughts. CHEBR is an Automatic Case Elicitation-based agent with a learning approach that performs random exploration in the search space. These random explorations allow the agent to present an extremely adaptive and non-deterministic behavior. On the other hand, the high frequency at which decisions are made randomly compromises the agent in terms of maintaining a good performance. LS-VisionDraughts is a Multi-Layer Perceptron Neural Network player trained through Reinforcement Learning. Such an agent presents an inconvenience in that it is completely predictable, as the same move is always executed when presented with the same board of play. By combining the best abilities from these players, ACE-RL-Checkers uses knowledge provided from LS-VisionDraughts in order to direct random exploration of the automatic case elicitation technique to more promising regions in the search space. Therewith, the ACE-RL-Checkers gains in terms of performance as well as acquires adaptability in its decision-making -- choosing moves based on the current game dynamics. Although ACE-RL-Checkers has proven its efficiency when pitted against its adversaries, the authors propose in the present paper two alternative strategies to calculate the rating of the cases generated in ACE-RL-Checkers in such a way as to improve future performance. Briefly, in these new alternatives two distinct strategies are investigated: elimination of the decaying memory and the insertion of the exploration/exploitation tradeoff dilemma inherent to UCT (Upper Confidence bounds applied to Trees) technique. Experiments carried out in tournaments involving these new strategies and the original strategy adopted in ACE-RL-Checkers confirm the improvement in the accuracy of the cases generated by the proposed strategies and their consequent performance in relation to the original strategy.