DocumentCode :
468384
Title :
Utility Based Q-learning to Maintain Cooperation in Prisoner´s Dilemma Games
Author :
Moriyama, Koichi
Author_Institution :
Osaka Univ., Osaka
fYear :
2007
fDate :
2-5 Nov. 2007
Firstpage :
146
Lastpage :
152
Abstract :
This work deals with Q-learning in a multiagent environment. There are many multiagent Q-learning methods, and most of them aim to converge to a Nash equilibrium, which is not desirable in games like the prisoner´s dilemma (PD). However, normal Q-learning agents that use a stochastic method in choosing actions to avoid local optima may bring mutual cooperation in PD. Although such mutual cooperation usually occurs singly, it can be maintained if the Q- function of cooperation becomes larger than that of defection after the cooperation. This work derives a theorem on how many times the cooperation is needed to make the Q- function of cooperation larger than that of defection. In addition, from the perspective of the author´s previous works that discriminate utilities from rewards and use utilities for learning in PD, this work also derives a corollary on how much utility is necessary to make the Q-function larger by one-shot mutual cooperation.
Keywords :
game theory; learning (artificial intelligence); multi-agent systems; Nash equilibrium; multiagent Q-learning methods; multiagent environment; prisoner´s dilemma; stochastic method; utility based Q-learning; Game theory; Intelligent agent; Learning systems; Machine learning; Multiagent systems; Nash equilibrium; Stochastic processes; Toy industry;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Agent Technology, 2007. IAT '07. IEEE/WIC/ACM International Conference on
Conference_Location :
Fremont, CA
Print_ISBN :
978-0-7695-3027-7
Type :
conf
DOI :
10.1109/IAT.2007.60
Filename :
4407275
Link To Document :
بازگشت