• DocumentCode
    32151
  • Title

    Distributed Policy Evaluation Under Multiple Behavior Strategies

  • Author

    Valcarcel Macua, Sergio ; Jianshu Chen ; Zazo, Santiago ; Sayed, Ali H.

  • Author_Institution
    Dept. of Signals, Syst. & Radiocommun., Univ. Politec. de Madrid, Madrid, Spain
  • Volume
    60
  • Issue
    5
  • fYear
    2015
  • fDate
    May-15
  • Firstpage
    1260
  • Lastpage
    1274
  • Abstract
    We apply diffusion strategies to develop a fully-distributed cooperative reinforcement learning algorithm in which agents in a network communicate only with their immediate neighbors to improve predictions about their environment. The algorithm can also be applied to off-policy learning, meaning that the agents can predict the response to a behavior different from the actual policies they are following. The proposed distributed strategy is efficient, with linear complexity in both computation time and memory footprint. We provide a mean-square-error performance analysis and establish convergence under constant step-size updates, which endow the network with continuous learning capabilities. The results show a clear gain from cooperation: when the individual agents can estimate the solution, cooperation increases stability and reduces bias and variance of the prediction error; but, more importantly, the network is able to approach the optimal solution even when none of the individual agents can (e.g., when the individual behavior policies restrict each agent to sample a small portion of the state space).
  • Keywords
    computational complexity; learning (artificial intelligence); mean square error methods; computation time; continuous learning capabilities; distributed policy evaluation; fully-distributed cooperative reinforcement learning algorithm; linear complexity; mean-square-error performance analysis; memory footprint; off-policy learning; Approximation algorithms; Equations; Linear approximation; Markov processes; Prediction algorithms; Vectors; Adaptive networks; Arrow-Hurwicz algorithm; diffusion strategies; distributed processing; gradient temporal difference; mean-square-error; reinforcement learning; saddle-point problem; saddlepoint problem;
  • fLanguage
    English
  • Journal_Title
    Automatic Control, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9286
  • Type

    jour

  • DOI
    10.1109/TAC.2014.2368731
  • Filename
    6949624