DocumentCode :
681073
Title :
Proposal of a propagation algorithm of the Expected Failure Probability and the effectiveness on multi-agent environments
Author :
Miyazaki, Kazuteru ; Muraoka, Hiroki ; Kobayashi, Hiroaki
Author_Institution :
Research Department, National Institution for Academic Degrees and University Evaluation, Tokyo, Japan
fYear :
2013
fDate :
14-17 Sept. 2013
Firstpage :
1067
Lastpage :
1072
Abstract :
The Improved Penalty Avoiding Rational Policy Making algorithm (IPARP) that can learn by a reward and a penalty. IPARP aims to find penalty rules that have a high possibility to receive a penalty. Though IPARP is effective in many cases, it needs many trial-and-error searches due to memory constraints. In this paper, a propagation algorithm of the Expected Failure Probability (EFP) is proposed to speed it up. Furthermore, it is extended to multi-agent environments. In a multi-agent learning, it is important to avoid concurrent learning problem [1] that occurs when multiple agents learn concurrently. Hence two methods are proposed to avoid the problem and confirm their effectiveness by numerical experiments.
Keywords :
Boltzmann distribution; Educational institutions; Electronic mail; Learning (artificial intelligence); Least squares methods; Memory management; Proposals; Exploitation-oriented Learning; Multi-agent learning; Reinforcement Learning; concurrent learning problem;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
SICE Annual Conference (SICE), 2013 Proceedings of
Conference_Location :
Nagoya, Japan
Type :
conf
Filename :
6736240
Link To Document :
بازگشت