DocumentCode :
3740419
Title :
Improving Multi-agent Learners Using Less-Biased Value Estimators
Author :
Sherief Abdallah;Michael Kaisers
Author_Institution :
Fac. of Eng. &
Volume :
2
fYear :
2015
Firstpage :
120
Lastpage :
124
Abstract :
Many different value-based or policy-search reinforcement learning algorithms have been applied to multi-agent settings. Value-based learners estimate the expected return (value) for each state-action combination and then derive a policy from these expectations. Policy-search learners optimize the agent´s policy directly by using a parameterized representation of the policy and then optimizing the parameter values to maximize the expected return. While the two classes of algorithms have been considered as contrasting one another, we note that several policy-search algorithms (e.g., Weighted Policy Learner and Infinitesimal Gradient Ascent) need a method for estimating the expected returns. In practice, these policy-search algorithms internally use an update equation for incrementally improving value estimates. In this paper we present the first detailed study of the effect of using different value-based learning algorithms as components of policy-search learners. Our results show that the particular choice can significantly affect performance.
Keywords :
"Games","Prediction algorithms","Algorithm design and analysis","Approximation algorithms","Learning (artificial intelligence)","Mathematical model","Electronic mail"
Publisher :
ieee
Conference_Titel :
Web Intelligence and Intelligent Agent Technology (WI-IAT), 2015 IEEE / WIC / ACM International Conference on
Type :
conf
DOI :
10.1109/WI-IAT.2015.113
Filename :
7397346
Link To Document :
بازگشت