DocumentCode
17124
Title
-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through ; the agents have no prior information on the global state transition and on the local agent cost statistics. The network agents minimize a network-averaged infinite horizon discounted cost, by local processing and by collaborating through mutual information exchange over a sparse (possibly stochastic) communication network. The agents respond differently (depending on their instantaneous one-stage random costs) to a global controlled state and the control actions of a remote controller. When each agent is aware only of its local online cost data and the inter-agent communication network is weakly connected, we prove that <i>Q D</i>-learning, a consensus + innovations algorithm with mixed time-scale stochastic dynamics, converges asymptotically almost surely to the desired value function and to the optimal stationary control policy at each network agent.</div></div>
</li>
<li class='list-group-item border-0 py-3 px-0'>
<div class='row g-0 align-items-center mb-2'><div class='col-12 col-md-3 fullRecLabelEnglish fw-bold mb-2 mb-md-0'><span class='text-muted small'>Keywords</span></div><div class='col-12 col-md-9 leftDirection leftAlign'>Markov processes; groupware; learning (artificial intelligence); minimisation; multi-agent systems; random processes; stochastic processes; telecontrol; MDP; QD-learning; collaborative distributed strategy; consensus-innovation algorithm; global controlled state; global state transition; instantaneous one-stage random costs; interagent communication network; local agent cost statistics; local online cost data; mixed time-scale stochastic dynamics; multiagent Markov decision process; mutual information exchange; network agents; network-averaged infinite horizon discounted cost; optimal stationary control policy; reinforcement Q -learning; remote controller; sparse communication network; stochastic communication network; Communication networks; Learning; Markov processes; Optimization; Process control; Symmetric matrices; Technological innovation; <formula formulatype=)
${rm consensus} + {rm innovations}$ ; Collaborative network processing; distributed $Q$ -learning; mixed time-scale dynamics; multi-agent stochastic control; reinforcement learning;
fLanguage
English
Journal_Title
Signal Processing, IEEE Transactions on
Publisher
ieee
ISSN
1053-587X
Type
jour
DOI
10.1109/TSP.2013.2241057
Filename
6415291
Link To Document