Title :
Multiple timescales PIA for cooperative reinforcement learning based on MDP model
Author :
Yamaguchi, Tomohiro ; Imatani, Eri
Author_Institution :
Nara Nat. Coll. of Technol., Nara
Abstract :
This paper describes a new method of dynamic programming (DP) based multiagent reinforcement learning in Markov decision process (MDP) model. It is difficult for agents to learn cooperative actions among agents properly in multiagent because they may change each policy in same time. To solve this problem, each agent should learn in different time for each policy improvement. Therefore, we propose multiple timescales policy improvement method. We show comparative experiments between multiple timescales policy improvement and exclusive policy improvement. As a result, our methods reduced the search costs for the optimal common-payoff Nash solution.
Keywords :
Markov processes; decision theory; iterative methods; learning (artificial intelligence); multi-agent systems; Markov decision process model; cooperative reinforcement learning; dynamic programming; multiagent reinforcement learning; multiple timescales policy iteration algorithm; optimal common-payoff Nash solution; Artificial intelligence; Cost function; Dynamic programming; Educational institutions; Electronic mail; Game theory; Learning systems; Multiagent systems; Nash equilibrium; Stochastic processes; PIA; cooperative; multiagent reinforcement learning; multiple timescales;
Conference_Titel :
SICE, 2007 Annual Conference
Conference_Location :
Takamatsu
Print_ISBN :
978-4-907764-27-2
Electronic_ISBN :
978-4-907764-27-2
DOI :
10.1109/SICE.2007.4421462