Title :
Proposal of a propagation algorithm of the Expected Failure Probability and the effectiveness on multi-agent environments
Author :
Miyazaki, Kazuteru ; Muraoka, Hiroki ; Kobayashi, Hiroaki
Author_Institution :
Research Department, National Institution for Academic Degrees and University Evaluation, Tokyo, Japan
Abstract :
The Improved Penalty Avoiding Rational Policy Making algorithm (IPARP) that can learn by a reward and a penalty. IPARP aims to find penalty rules that have a high possibility to receive a penalty. Though IPARP is effective in many cases, it needs many trial-and-error searches due to memory constraints. In this paper, a propagation algorithm of the Expected Failure Probability (EFP) is proposed to speed it up. Furthermore, it is extended to multi-agent environments. In a multi-agent learning, it is important to avoid concurrent learning problem [1] that occurs when multiple agents learn concurrently. Hence two methods are proposed to avoid the problem and confirm their effectiveness by numerical experiments.
Keywords :
Boltzmann distribution; Educational institutions; Electronic mail; Learning (artificial intelligence); Least squares methods; Memory management; Proposals; Exploitation-oriented Learning; Multi-agent learning; Reinforcement Learning; concurrent learning problem;
Conference_Titel :
SICE Annual Conference (SICE), 2013 Proceedings of
Conference_Location :
Nagoya, Japan